Title: Greedy Algorithms:
1Greedy Algorithms Many real-world problems are
optimization problems in that they attempt to
find an optimal solution among many possible
candidate solutions. A familiar scenario is the
change-making problem that we often encounter at
a cash register receiving the fewest numbers of
coins to make change after paying the bill for a
purchase. For example, the purchase is worth
5.27, how many coins and what coins does a cash
register return after paying a 6 bill? The
Make-Change algorithm For a given amount (e.g.
0.73), use as many quarters (0.25) as possible
without exceeding the amount. Use as many dimes
(.10) for the remainder, then use as many
nickels (.05) as possible. Finally, use the
pennies (.01) for the rest.
2Example To make change for the amount x 67
(cents). Use q ?x/25? 2 quarters. The
remainder x 25q 17, which we use d
?17/10? 1 dime. Then the remainder 17 10d
7, so we use n ?7/5? 1 nickel. Finally,
the remainder 7 5n 2, which requires p
?2/1? 2 pennies. The total number of coins
used q d n p 6. Note The above
algorithm is optimal in that it uses the fewest
number of coins among all possible ways to make
change for a given amount. (This fact can be
proven formally.) However, this is dependent on
the denominations of the US currency system. For
example, try a system that uses denominations of
1-cent, 6-cent, and 7-cent coins, and try to make
change for x 18 cents. The greedy strategy
uses 2 7-cents and 4 1-cents, for a total of 6
coins. However, the optimal solution is to use 3
6-cent coins.
3A Generic Greedy Algorithm (1) Initialize C to
be the set of candidate solutions (2) Initialize
a set S the empty set ? (the set is to be
the optimal solution we are constructing). (3
) While C ? ? and S is (still) not a solution
do (3.1) select x from set C using a greedy
strategy (3.2) delete x from C (3.3) if
x ? S is a feasible solution, then S
S ? x (i.e., add x to set S) (4) if S is a
solution then return S (5) else
return failure In general, a greedy algorithm is
efficient because it makes a sequence of (local)
decisions and never backtracks. The solution is
not always optimal, however.
4The Knapsack Problem Given n objects each have a
weight wi and a value vi , and given a knapsack
of total capacity W. The problem is to pack the
knapsack with these objects in order to maximize
the total value of those objects packed without
exceeding the knapsacks capacity. More
formally, let xi denote the fraction of the
object i to be included in the knapsack, 0 ? xi ?
1, for 1 ? i ? n. The problem is to find
values for the xi such that Note that we may
assume because otherwise, we would
choose xi 1 for each i which would be an
obvious optimal solution.
5There seem to be 3 obvious greedy
strategies (Max value) Sort the objects from the
highest value to the lowest, then pick them in
that order. (Min weight) Sort the objects from
the lowest weight to the highest, then pick them
in that order. (Max value/weight ratio) Sort the
objects based on the value to weight ratios, from
the highest to the lowest, then select. Example
Given n 5 objects and a knapsack capacity W
100 as in Table I. The three solutions are given
in Table II.
select xi value Max vi 0 0 1 0.5 1
146 Min wi 1 1 1 1 0 156 Max
vi/wi 1 1 1 0 0.8 164
w 10 20 30 40 50 v 20 30
66 40 60 v/w 2.0 1.5 2.2 1.0 1.2
Table I
Table II
6The Optimal Knapsack Algorithm Input an
integer n, positive values wi and vi , for 1 ? i
? n, and another positive value
W. Output n values xi such that 0 ? xi ? 1 and
Algorithm (of time complexity O(n lgn)) (1)
Sort the n objects from large to small based on
the ratios vi/wi . We assume the arrays
w1..n and v1..n store the respective
weights and values after sorting. (2) initialize
array x1..n to zeros. (3) weight 0 i
1 (4) while (i ? n and weight lt W) do
(4.1) if weight wi ? W then xi 1
(4.2) else xi (W weight) / wi (4.3)
weight weight xi wi (4.4) i
7Optimal 2-way Merge patterns and Huffman
Codes Example. Suppose there are 3 sorted lists
L1, L2, and L3, of sizes 30, 20, and 10,
respectively, which need to be merged into a
combined sorted list, but we can merge only two
at a time. We intend to find an optimal merge
pattern which minimizes the total number of
comparisons. For example, we can merge L1 and
L2, which uses 30 20 50 comparisons resulting
in a list of size 50. We can then merge this
list with list L3, using another 50 10 60
comparisons, so the total number of comparisons
is 50 60 110. Alternatively, we can merge
lists L2 and L3, using 20 10 30 comparisons,
the resulting list (size 30) can then be merged
with list L1, for another 30 30 60
comparisons. So the total number of comparisons
is 30 60 90. It doesnt take long to see
that this latter merge pattern is the optimal one.
8Binary Merge Trees We can depict the merge
patterns using a binary tree, built from the leaf
nodes (the initial lists) towards the root in
which each merge of two nodes creates a parent
node whose size is the sum of the sizes of the
two children. For example, the two previous
merge patterns are depicted in the following two
figures
Cost 302 202 101 110
Cost 301 202 102 90
60
60
10
30
30
50
10
20
30
20
Merge L1 and L2, then with L3
Merge L2 and L3, then with L1
merge cost sum of all weighted external path
lengths
9Optimal Binary Merge Tree Algorithm Input n
leaf nodes each have an integer size, n ?
2. Output a binary tree with the given leaf
nodes which has a minimum total weighted
external path lengths Algorithm (1) create a
min-heap T1..n based on the n initial
sizes. (2) while (the heap size ? 2)
do (2.1) delete from the heap two smallest
values, call them a and b, create a
parent node of size a b for the
nodes corresponding to these two values (2.2)
insert the value (a b) into the heap which
corresponds to the node created in Step
(2.1) When the algorithm terminates, there is a
single value left in the heap whose corresponding
node is the root of the optimal binary merge
tree. The algorithms time complexity is O(n
lgn) because Step (1) takes O(n) time Step (2)
runs O(n) iterations, in which each iteration
takes O(lgn) time.
10Example of the optimal merge tree algorithm
Initially, 5 leaf nodes with sizes
2
3
5
7
9
5
Iteration 1 merge 2 and 3 into 5
2
3
5
7
9
10
Iteration 2 merge 5 and 5 into 10
Iteration 3 merge 7 and 9 (chosen among 7, 9,
and 10) into 16
5
5
16
2
3
7
9
26
Iteration 4 merge 10 and 16 into 26
16
10
5
5
7
9
Cost 23 33 52 72 92 57.
2
3
11Proof of optimality of the binary merge tree
algorithm We use induction on n ? 2 to show that
the binary merge tree is optimal in that it gives
the minimum total weighted external path lengths
(among all possible ways to merge the given leaf
nodes into a binary tree). (Basis) When n
2. There is only one way to merge two
nodes. (Induction Hypothesis) Suppose the merge
tree is optimal when there are k leaf nodes, for
some k ? 2. (Induction) Consider (k 1)
leaf nodes. Call them a1, a2, , and ak1. We
may assume nodes a1, a2 are of the smallest
values, which are merged in the first step of the
merge algorithm into node b. We call the merge
tree T, the part excluding a1, a2 T (see
figure). Suppose an optimal binary merge tree is
S. We make two observations. (1) If node x of S
is a deepest internal node, we may swap its two
children with nodes a1, a2 in S without
increasing the total weighted external path
lengths. Thus, we may assume tree S has a
subtree S with leaf nodes x, a2, , and
ak1. (2) The tree S must be an optimal merge
tree for k nodes x, a2, , and ak1.
By induction hypothesis, tree S has
a total weighted external path
lengths equal to that of tree T.
Therefore, the total weighted
external path lengths of T equals to
that of tree S, proving the optimality of T.
T
S
T
S
x
b
a1
a2
a1
a2
12Huffman Codes Suppose we wish to save a text
(ASCII) file on the disk or to transmit it though
a network using an encoding scheme that minimizes
the number of bits required. Without
compression, characters are typically encoded by
their ASCII codes with 8 bits per character. We
can do better if we have the freedom to design
our own encoding. Example. Given a text file that
uses only 5 different letters (a, e, i, s, t),
the space character, and the newline character.
Since there are 7 different characters, we could
use 3 bits per character because that allows 8
bit patterns ranging from 000 through 111 (so we
still one pattern to spare). The following table
shows the encoding of characters, their
frequencies, and the size of encoded (compressed)
file.
13Character Frequency Code Total bits a 10 000 30
e 15 001 45 i 12 010 36
s 3 011 9 t 4 100 12
space 13 101 39 newline 1 110
3 Total 58 174
Code Total bits 001 30 01 30
10 24 00000 15 0001 16
11 26 00001 5
146
Fixed-length encoding
Variable-length encoding
If we can use variable lengths for the codes, we
can actually compress more as shown in the above.
However, the codes must satisfy the property
that no code is the prefix of another code such
code is called a prefix code.
14How to design an optimal prefix code (i.e., with
minimum total length) for a given file? We can
depict the codes for the given collection of
characters using a binary tree as follows
reading each code from left to right, we
construct a binary tree from the root following
the left branch when encountering a 0, right
branch when encountering a 1. We do this for
all the codes by constructing a single combined
binary tree. For example,
1
0
0
0
1
1
0
0
0
0
1
1
1
Codes 001, 01, 10, 00000, 0001, 11, and 00001
Note each code terminates at a leaf node, by the
prefix property.
Code 001
Codes 001 and 01
Codes 001, 01, and 10
15We note that the encoded file size is equal to
the total weighted external path lengths if we
assign the frequency to each leaf node. For
example,
e
15
12
13
Total file size 35 15 44 103 152
122 132 146, which is exactly the total
weighted external path lengths.
i
10
a
4
t
3
1
s
\n
We also note that in an optimal prefix code, each
node in the tree has either no children or has
two. Thus, the optimal binary merge tree
algorithm finds the optimal code (Huffman code).
x
x
y
Merge x and y, reducing total size
Node x has only one child y
16Greedy Strategies Applied to Graph problems We
first review some notations and terms about
graphs. A graph consists of vertices (nodes) and
edges (arcs, links), in which each edge
connects two vertices (not necessarily
distinct). More formally, a graph G (V, E),
where V and E denote the sets of vertices and
edges, respectively.
In this example, V 1, 2, 3, 4, E a, b, c,
d, e. Edges c and d are parallel edges edge e
is a self-loop. A path is a sequence of
adjacent edges, e.g., path abeb, path acdab.
1
a
b
2
3
d
c
e
4
17Directed graphs vs. (un-directed) graphs If
every edge has an orientation, e.g., an edge
starting from node x terminating at node y, the
graph is called a directed graph, or digraph for
short. If all edges have no orientation, the
graph is called an undirected graph, or simply, a
graph. When there are no parallel edges (two
edges that have identical end points), we could
identify an edge with its two end points, such as
edge (1,2), or edge (3,3). In an undirected
graph, edge (1,2) is the same as edge (2,1). We
will assume no parallel edges unless otherwise
stated.
1
A directed graph. Edges c and d are parallel
(directed) edges. Some directed paths are ad,
ebac.
a
b
2
3
d
c
e
4
18Both directed and undirected graphs appear often
and naturally in many scientific (call graphs in
program analysis), business (query trees,
entity-relation diagrams in databases), and
engineering (CAD design) applications. The
simplest data structure for representing graphs
and digraphs is using 2-dimensional arrays.
Suppose G (V, E), and V n. Declare an
array T1..n1..n so that Tij 1 if there
is an edge (i, j) ? E 0 otherwise. (Note that
in an undirected graph, edges (i, j) and (j, i)
refer to the same edge.)
j
1 2 3 4
1
A 2-dimensional array for the digraph, called the
adjacency matrix.
1234
2
i
3
4
19Sometimes, edges of a graph or digraph are given
a positive weight or cost value. In that case,
the adjacency matrix can easily modified so that
Tij the weight of edge (i, j) 0 if there
is no edge (i, j). Since the adjacency matrix
may contain many zeros (when the graph has few
edges, known as sparse), a space-efficient
representation uses linked lists representing the
edges, known as the adjacency list representation.
1
2
1234
4
2
3
3
1
2
4
The adjacency lists for the digraph, which can
store edge weights by adding another field in the
list nodes.
20The Minimum Spanning Tree (MST) Problem Given a
weighted (undirected) graph G (V, E), where
each edge e has a positive weight w(e). A
spanning tree of G is a tree (connected graph
without cycles, or circuits) which has V as its
vertex set, i.e., the tree connects all vertices
of the graph G. If V n, then the tree has n
1 edges (this is a fact which can be proved by
induction). A minimum spanning tree of G is a
spanning tree that has the minimum total edge
weight.
1
1
3
3
6
6
8
A minimum spanning tree (of 4 edges), weight 3
2 4 6 15.
2
2
3
3
5
4
4
7
5
5
4
4
2
2
A weighted graph of no parallel edges or
self-loops
21Prims Algorithm for the Minimum Spanning Tree
problem Create an array B1..n to store the
nodes of the MST, and an array T1..n 1 to
store the edges of the MST. Starting with node 1
(actually, any node can be the starting node),
put node 1 in B1, find a node that is the
closest (i.e., an edge connected to node 1 that
has the minimum weight, ties broken arbitrarily).
Put this node as B2, and the edge as T1.
Next look for a node connected from either B1
or B2 that is the closest, store the node as
B3, and the corresponding edge as T2. In
general, in the kth iteration, look for a node
not already in B1..k that is the closest to any
node in B1..k. Put this node as Bk1, the
corresponding edge as Tk. Repeat this process
for n 1 iterations (k 1 to n 1). This is a
greedy strategy because in each iteration, the
algorithm looks for the minimum weight edge to
include next while maintaining the tree property
(i.e., avoiding cycles). At the end there are
exactly n 1 edges without cycles, which must be
a spanning tree.
22Example Prims MST Algorithm.
Step Next edge selected Partial tree
1
3
Initially
1
6
8
2
3
1
3
5
1 (1,5), weight3
4
7
5
5
4
1
2
2 (5,4), weight2
5
A weighted graph
4
2
1
3 (4,2), weight4
2
5
4
4
1
6
4 (1,3), weight6
2
5
3
4
23An adjacency matrix implementation of Prims
algorithm Input W1..n1..n with Wi, j
weight of edge (i, j) set Wi, j ? if no
edge Output an MST with tree edges stored in
T1..n 1 Algorithm (1) declare
nearest2..n, minDist2..n such that
minDistti the minimum edge weight connecting
node i to any node in partial tree T, and
nearestithe node in T that gives minimum
distance for node i. (2) for i 2 to
n do nearesti1
minDistiWi, 1 (3) for p 1 to (n 1)
do (3.1) min ?
(3.2) for j 2 to n do if
0 ? minDistj lt min then min minDistj
k j (3.3) Tp edge (nearestk,
k) // selected the nest edge (3.4)
minDistk 1 // a negative value means node k
is in (3.5) for j 2 to n
do // update minDist and nearest values
if Wj, k lt minDistj then minDistj
Wj, k nearestj k The time complexity is
O(n2) because Step (3) runs O(n) iterations, each
iteration runs O(n) time in Steps (3.2) and (3.5).
Tree T
nearesti
i
minDisti
24The Single-Source Shortest Paths Problem Given a
directed graph, and a single node called the
source. For each of the remaining nodes, find a
shortest path connected from the source (assuming
the direction of the edges along the paths are
respected). A Greedy algorithm due to Dijkstra
which finds these shortest paths in sequence can
be described as follows find the shortest among
all shortest paths (from the source), then find
the second shortest, etc., breaking ties
arbitrarily, until all shortest paths are found.
During the process, the collection of all the
shortest paths determined so far form a tree the
next shortest path is selected by finding a node
that is one edge away from the current tree and
has the shortest distance measured from the
source.
25Example (Dijkstras shortest paths algorithm)
Remaining nodes and the distances step tree
of shortest paths from the source
1
50
10
30
C 2, 3, 4, 5 D 50,30,100,10
2
5
Initially
1
100
20
10
5
Choose node 5
1
Changed from 100
2, 3, 4 50,30, 20
3
4
5
50
A weighted directed graph, source node 1
1
Changed from 50
2, 3 40,30
Choose node 4
5
4
1
Shortest paths To Path Distance
5 (1,5) 10 4 (1,5,4) 20
3 (1,3) 30 2 (1,3,2) 35
Choose node 3
2 35
Changed from 40
3
5
4
1
2
Choose node 2
?
3
5
4
26Implementation of Dijkstras algorithm Input
W1..n1..n with Wi, j weight of edge (i,
j) set Wi, j ? if no edge Output an array
D2..n of distances of shortest paths to each
node in 2..n Algorithm (1) C 2,3,,n //
the set of remaining nodes (2) for i 2
to n do Di W1,i // initialize distance
from node 1 to node i (3) repeat the following n
2 times // determine the shortest
distances (3.1) select node v of set C that
has the minimum value in array D (3.2) C C
v // delete node v from set C (3.3) for
each node w in C do if (Dv
Wv, w lt Dw) then Dw Dv Wv, w
// update Dw if found shorter path to w
The algorithms time complexity is O(n2) because
Steps (1) and (2) each take O(n) time Step (3)
runs in O(n) iterations in which each iteration
runs in O(n) time.
1
Dw
w
Dv
Wv,w
Tree of shortest paths
v
27Graph (and Digraph) Traversal techniques Given a
(directed) graph G (V, E), determine all nodes
that are connected from a given node v via a
(directed) path. The are essentially two graph
traversal algorithms, known as Breadth-first
search (BFS) and depth-first search (DFS), both
of which can be implemented efficiently. BFS
From node v, visit each of its neighboring nodes
in sequence, then visit their neighbors, etc.,
while avoiding repeated visits. DFS From node v,
visit its first neighboring node and all its
neighbors using recursion, then visit node vs
second neighbor applying the same procedure,
until all vs neighbors are visited, while
avoiding repeated visits.
28Breadth-First Search (BFS) BFS(v) // visit all
nodes reachable from node v (1) create an
empty FIFO queue Q, add node v to Q (2) create
a boolean array visited1..n, initialize all
values to false except for visitedv to
true (3) while Q is not empty
(3.1) delete a node w from Q (3.2) for
each node z adjacent from node w if
visitedz is false then add node z
to Q and set visitedz to true
1
The time complexity is O(ne) with n nodes and e
edges, if the adjacency lists are used. This is
because in the worst case, each node is added
once to the queue (O(n) part), and each of its
neighbors gets considered once (O(e) part).
Node search order starting with node 1, including
two nodes not reached
2
4
5
6
3
29Depth-First Search (DFS) (1) create a boolean
array visited1..n, initialize all values to
false except for visitedv to true (2) call
DFS(v) to visit all nodes reachable via a path
DFS(v) for each neighboring nodes w
of v do if visitedw is false then
set visitedw to true call DFS(w) //
recursive call
1
Node search order starting with node 1, including
two nodes not reached
The algorithms time complexity is also O(ne)
using the same reasoning as in the BFS algorithm.
2
5
4
6
3