Title: Design and Analysis of Computer Algorithm Lecture 5-3
1Design and Analysis of Computer AlgorithmLecture
5-3
- Pradondet Nilagupta
- Department of Computer Engineering
This lecture note has been modified from lecture
note for 23250 by Prof. Francis Chin , CS332 by
David Luekbe
2Greedy Method (Cont.)
3Disjoint-Set Union Problem
- Want a data structure to support disjoint sets
- Collection of disjoint sets S Si, Si n Sj ?
- Need to support following operations
- MakeSet(x) S S U x
- Union(Si, Sj) S S - Si, Sj U Si U Sj
- FindSet(X) return Si ? S such that x ? Si
- Before discussing implementation details, we look
at example application MSTs
4Kruskals Algorithm
- Kruskal()
-
- T ?
- for each v ? V
- MakeSet(v)
- sort E by increasing edge weight w
- for each (u,v) ? E (in sorted order)
- if FindSet(u) ? FindSet(v)
- T T U u,v
- Union(FindSet(u), FindSet(v))
5Kruskals Algorithm
Run the algorithm
- Kruskal()
-
- T ?
- for each v ? V
- MakeSet(v)
- sort E by increasing edge weight w
- for each (u,v) ? E (in sorted order)
- if FindSet(u) ? FindSet(v)
- T T U u,v
- Union(FindSet(u), FindSet(v))
2
19
9
17
14
25
8
5
1
13
21
6Kruskals Algorithm
Run the algorithm
- Kruskal()
-
- T ?
- for each v ? V
- MakeSet(v)
- sort E by increasing edge weight w
- for each (u,v) ? E (in sorted order)
- if FindSet(u) ? FindSet(v)
- T T U u,v
- Union(FindSet(u), FindSet(v))
2
19
9
17
14
25
8
5
1
13
21
7Kruskals Algorithm
Run the algorithm
- Kruskal()
-
- T ?
- for each v ? V
- MakeSet(v)
- sort E by increasing edge weight w
- for each (u,v) ? E (in sorted order)
- if FindSet(u) ? FindSet(v)
- T T U u,v
- Union(FindSet(u), FindSet(v))
2
19
9
17
14
25
8
5
1
13
21
8Kruskals Algorithm
Run the algorithm
- Kruskal()
-
- T ?
- for each v ? V
- MakeSet(v)
- sort E by increasing edge weight w
- for each (u,v) ? E (in sorted order)
- if FindSet(u) ? FindSet(v)
- T T U u,v
- Union(FindSet(u), FindSet(v))
2
19
9
17
14
25
8
5
1?
13
21
9Kruskals Algorithm
Run the algorithm
- Kruskal()
-
- T ?
- for each v ? V
- MakeSet(v)
- sort E by increasing edge weight w
- for each (u,v) ? E (in sorted order)
- if FindSet(u) ? FindSet(v)
- T T U u,v
- Union(FindSet(u), FindSet(v))
2
19
9
17
14
25
8
5
1
13
21
10Kruskals Algorithm
Run the algorithm
- Kruskal()
-
- T ?
- for each v ? V
- MakeSet(v)
- sort E by increasing edge weight w
- for each (u,v) ? E (in sorted order)
- if FindSet(u) ? FindSet(v)
- T T U u,v
- Union(FindSet(u), FindSet(v))
2?
19
9
17
14
25
8
5
1
13
21
11Kruskals Algorithm
Run the algorithm
- Kruskal()
-
- T ?
- for each v ? V
- MakeSet(v)
- sort E by increasing edge weight w
- for each (u,v) ? E (in sorted order)
- if FindSet(u) ? FindSet(v)
- T T U u,v
- Union(FindSet(u), FindSet(v))
2
19
9
17
14
25
8
5
1
13
21
12Kruskals Algorithm
Run the algorithm
- Kruskal()
-
- T ?
- for each v ? V
- MakeSet(v)
- sort E by increasing edge weight w
- for each (u,v) ? E (in sorted order)
- if FindSet(u) ? FindSet(v)
- T T U u,v
- Union(FindSet(u), FindSet(v))
2
19
9
17
14
25
8
5?
1
13
21
13Kruskals Algorithm
Run the algorithm
- Kruskal()
-
- T ?
- for each v ? V
- MakeSet(v)
- sort E by increasing edge weight w
- for each (u,v) ? E (in sorted order)
- if FindSet(u) ? FindSet(v)
- T T U u,v
- Union(FindSet(u), FindSet(v))
2
19
9
17
14
25
8
5
1
13
21
14Kruskals Algorithm
Run the algorithm
- Kruskal()
-
- T ?
- for each v ? V
- MakeSet(v)
- sort E by increasing edge weight w
- for each (u,v) ? E (in sorted order)
- if FindSet(u) ? FindSet(v)
- T T U u,v
- Union(FindSet(u), FindSet(v))
2
19
9
17
14
25
8?
5
1
13
21
15Kruskals Algorithm
Run the algorithm
- Kruskal()
-
- T ?
- for each v ? V
- MakeSet(v)
- sort E by increasing edge weight w
- for each (u,v) ? E (in sorted order)
- if FindSet(u) ? FindSet(v)
- T T U u,v
- Union(FindSet(u), FindSet(v))
2
19
9
17
14
25
8
5
1
13
21
16Kruskals Algorithm
Run the algorithm
- Kruskal()
-
- T ?
- for each v ? V
- MakeSet(v)
- sort E by increasing edge weight w
- for each (u,v) ? E (in sorted order)
- if FindSet(u) ? FindSet(v)
- T T U u,v
- Union(FindSet(u), FindSet(v))
2
19
9?
17
14
25
8
5
1
13
21
17Kruskals Algorithm
Run the algorithm
- Kruskal()
-
- T ?
- for each v ? V
- MakeSet(v)
- sort E by increasing edge weight w
- for each (u,v) ? E (in sorted order)
- if FindSet(u) ? FindSet(v)
- T T U u,v
- Union(FindSet(u), FindSet(v))
2
19
9
17
14
25
8
5
1
13
21
18Kruskals Algorithm
Run the algorithm
- Kruskal()
-
- T ?
- for each v ? V
- MakeSet(v)
- sort E by increasing edge weight w
- for each (u,v) ? E (in sorted order)
- if FindSet(u) ? FindSet(v)
- T T U u,v
- Union(FindSet(u), FindSet(v))
2
19
9
17
14
25
8
5
1
13?
21
19Kruskals Algorithm
Run the algorithm
- Kruskal()
-
- T ?
- for each v ? V
- MakeSet(v)
- sort E by increasing edge weight w
- for each (u,v) ? E (in sorted order)
- if FindSet(u) ? FindSet(v)
- T T U u,v
- Union(FindSet(u), FindSet(v))
2
19
9
17
14
25
8
5
1
13
21
20Kruskals Algorithm
Run the algorithm
- Kruskal()
-
- T ?
- for each v ? V
- MakeSet(v)
- sort E by increasing edge weight w
- for each (u,v) ? E (in sorted order)
- if FindSet(u) ? FindSet(v)
- T T U u,v
- Union(FindSet(u), FindSet(v))
2
19
9
17
14?
25
8
5
1
13
21
21Kruskals Algorithm
Run the algorithm
- Kruskal()
-
- T ?
- for each v ? V
- MakeSet(v)
- sort E by increasing edge weight w
- for each (u,v) ? E (in sorted order)
- if FindSet(u) ? FindSet(v)
- T T U u,v
- Union(FindSet(u), FindSet(v))
2
19
9
17
14
25
8
5
1
13
21
22Kruskals Algorithm
Run the algorithm
- Kruskal()
-
- T ?
- for each v ? V
- MakeSet(v)
- sort E by increasing edge weight w
- for each (u,v) ? E (in sorted order)
- if FindSet(u) ? FindSet(v)
- T T U u,v
- Union(FindSet(u), FindSet(v))
2
19
9
17?
14
25
8
5
1
13
21
23Kruskals Algorithm
Run the algorithm
- Kruskal()
-
- T ?
- for each v ? V
- MakeSet(v)
- sort E by increasing edge weight w
- for each (u,v) ? E (in sorted order)
- if FindSet(u) ? FindSet(v)
- T T U u,v
- Union(FindSet(u), FindSet(v))
2
19?
9
17
14
25
8
5
1
13
21
24Kruskals Algorithm
Run the algorithm
- Kruskal()
-
- T ?
- for each v ? V
- MakeSet(v)
- sort E by increasing edge weight w
- for each (u,v) ? E (in sorted order)
- if FindSet(u) ? FindSet(v)
- T T U u,v
- Union(FindSet(u), FindSet(v))
2
19
9
17
14
25
8
5
1
13
21?
25Kruskals Algorithm
Run the algorithm
- Kruskal()
-
- T ?
- for each v ? V
- MakeSet(v)
- sort E by increasing edge weight w
- for each (u,v) ? E (in sorted order)
- if FindSet(u) ? FindSet(v)
- T T U u,v
- Union(FindSet(u), FindSet(v))
2
19
9
17
14
25?
8
5
1
13
21
26Kruskals Algorithm
Run the algorithm
- Kruskal()
-
- T ?
- for each v ? V
- MakeSet(v)
- sort E by increasing edge weight w
- for each (u,v) ? E (in sorted order)
- if FindSet(u) ? FindSet(v)
- T T U u,v
- Union(FindSet(u), FindSet(v))
2
19
9
17
14
25
8
5
1
13
21
27Kruskals Algorithm
Run the algorithm
- Kruskal()
-
- T ?
- for each v ? V
- MakeSet(v)
- sort E by increasing edge weight w
- for each (u,v) ? E (in sorted order)
- if FindSet(u) ? FindSet(v)
- T T U u,v
- Union(FindSet(u), FindSet(v))
2
19
9
17
14
25
8
5
1
13
21
28Correctness Of Kruskals Algorithm
- Sketch of a proof that this algorithm produces an
MST for T - Assume algorithm is wrong result is not an MST
- Then algorithm adds a wrong edge at some point
- If it adds a wrong edge, there must be a lower
weight edge (cut and paste argument) - But algorithm chooses lowest weight edge at each
step. Contradiction - Again, important to be comfortable with cut and
paste arguments
29Kruskals Algorithm
What will affect the running time?
Kruskal() T ? for each v ? V
MakeSet(v) sort E by increasing edge weight
w for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v) T T U
u,v Union(FindSet(u), FindSet(v))
30Kruskals Algorithm
What will affect the running time? 1 Sort O(V)
MakeSet() calls O(E) FindSet() callsO(V) Union()
calls (Exactly how many Union()s?)
Kruskal() T ? for each v ? V
MakeSet(v) sort E by increasing edge weight
w for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v) T T U
u,v Union(FindSet(u), FindSet(v))
31Kruskals Algorithm Running Time
- To summarize
- Sort edges O(E lg E)
- O(V) MakeSet()s
- O(E) FindSet()s
- O(V) Union()s
- Upshot
- Best disjoint-set union algorithm makes above 3
operations take O(E??(E,V)), ? almost constant - Overall thus O(E lg E), almost linear w/o sorting
32Disjoint Set Union
- So how do we implement disjoint-set union?
- Naïve implementation use a linked list to
represent each set - MakeSet() ??? time
- FindSet() ??? time
- Union(A,B) copy elements of A into B ??? time
33Disjoint Set Union
- So how do we implement disjoint-set union?
- Naïve implementation use a linked list to
represent each set - MakeSet() O(1) time
- FindSet() O(1) time
- Union(A,B) copy elements of A into B O(A)
time - How long can a single Union() take?
- How long will n Union()s take?
34Disjoint Set Union Analysis
- Worst-case analysis O(n2) time for n Unions
- Union(S1, S2) copy 1 element
- Union(S2, S3) copy 2 elements
-
- Union(Sn-1, Sn) copy n-1 elements
- O(n2)
- Improvement always copy smaller into larger
- Why will this make things better?
- What is the worst-case time of Union()?
- But now n Unions take only O(n lg n) time!
35Huffman Code
36Data Compression
- Motivation
- Limited network bandwidth.
- Limited disk space.
- Huffman coding
- Variable length coding
- Shorter codes are used to encode characters that
occur frequently.
37Fixed-length code
- Each symbol is encoded using the same number of
bits. - C symbols
- ?log2 C? bits
- Simple, easy to work with
- TEST
- 100001011100
38Example Fixed-length code
39Code Trees
000 001 010 011 100 101 110
40Code Trees
TASTE 11 01 00000 11 10
1101000001110
41(No Transcript)
42Code Trees
P
A
E
T
sp
I
S
nl
Wrong
1101000001110 11 01 00 00 01 11 0 or
11 01 00000 11 10
43Code Trees
- Characters are placed only at the leaves.
- No character code is a prefix of another
character code. - Prefix code
- A sequence of bits can always be decoded
unambiguously. - Full Trees
- All nodes are leaves or have two children.
44Problem Description
- Input
- A list of symbols and their frequencies
- Output
- a code tree with minimum total cost
- Total cost
- ls number of bits of the code for symbol s
- fs frequency of symbol s
- Algorithm
- Huffmans algorithm
45Huffmans algorithm
- Maintain a forest of trees.
- weight of a tree sum of the frequencies of its
leaves - Initially, there are C single-node treesone for
each character. - Select two trees, T1 and T2, of smallest weights,
breaking ties arbitrarily, and form a new tree
with T1 and T2 as subtrees. - Repeat the previous step until there is only 1
tree. The tree is the optimal Huffman coding
tree.
46(No Transcript)
4741
39
67
A
T3
T4
T
sp
E
T2
T1
I
S
nl
48(No Transcript)
49Different Optimal Code Trees
50Exercise (1/2)
areleeelaeelarrn..
51Exercise (2/2)
52Maintaining the Trees
- Operations
- C initial inserts
- 2(C-1) deletes and C-1 inserts
- Maintain a sorted list using linked list
- O(C) per insert, O(1) per delete
- O(C2) total running time
- Use Priority Queues
- O(log C) per insert, O(log C) per delete
- O(C log C) total running time
53Code Table
- How to store the code table?
- Use some kind of array
- Problem codes have different lengths.
54Parenthesized Infix Expression
- E x symb (E,symb,E)
- symb a b c d e o
- (((a,o,b),o,c),o,(d,o,e))
55Prefix expression
- E symb E E
- symb a d m k f
- admkf
- prefix 000111011
- leaves admkf
56Encoding the code tree
- Let interior nodes contain the symbol
- Construct the prefix expression E
- admkf
- Remove s from E
- admkf
- In E, replace s with 0,
- others with 1
- 000111011
57Encoding the code tree
- Tree Structure
- 0 for interior nodes
- 1 for leaf nodes
- encoded as prefix expression
- 000111011
- Information in leaf nodes
- Array
- All entries have the same size.
- a d m k f
58More Example Huffman codes
- If we used a variable number of bits for a code
such that frequent characters use fewer bits and
infrequent character use more bits, we can
decrease the space needed to store the same
information. For example, consider the following
sentence - dead beef cafe deeded dad. dad faced a faded
cab. dad acceded. dad be bad. - There are 12 a's, 4 b's, 5 c's, 19 d's, 12
e's, 4 f's, 17 spaces, and 4 periods, for a total
of 77 characters.
59fixed-length code
- If we use a fixed-length code like this
- 000 (space)
- 001 a
- 010 b
- 011 c
- 100 d
- 101 e
- 110 f
- 111 .
Then the sentence, which is of length 77,
consumes 77 3 231 bits.
60variable length code
- if we use a variable length code like this
- 100 (space)
- 110 a
- 11110 b
- 1110 c
- 0 d
- 1010 e
- 11111 f
- 1011 .
we can encode the text in 3 12 4 5 5 4
19 1 12 4 4 5 17 3 4 4 230
bits. That a savings of 1 bit.
61Binary Code
- Suppose that we have a large amount of text that
we wish to store on a computer disk in an
efficient way. The simplest way to do this is
simply to assign a binary code to each character,
and then store the binary codes consecutively in
the computer memory. - The ASCII system for example, uses a fixed 8-bit
code to represent each character. Storing n
characters as ASCII text requires 8n bits of
memory.
62Binary Code (cont.)
- suppose that we are storing only the 10 numeric
characters 0 , 1 , . . . ,9 .
63Non-random data
- Consider the following data, which is taken from
a Postscript file.
64A good code
- What would happen if we used the following code
to store the data rather than the fixed length
code?
65Example
- To store the string 0748901
- we would get 0000011101001000100100000001
- using the fixed length code and
- 10110001010000111011010 using the variable length
code.
66Prefix codes
- a code in which no codeword is a prefix of any
other codeword. Decoding such a code is done
using a binary tree.
0
1
0
0
1
1
0
1
0
1
1
8
0
0
1
3
7
0
1
1
0
4
6
2
0
1
9
5
67Optimal trees
- A tree representing an optimal code for a file is
always a full binary tree namely, one where
every node is either a leaf or has precisely two
children.
68Why does it work?
- In order to show that Huffman's algorithm works,
we must show that there can be no prefix codes
that are better than the one produced by
Huffman's algorithm.
69Huffman Codes
- Let T be the tree, C be the set of characters c
that comprise the alphabet, and f(c) be the
frequency of character c. Since the number of
bits is the same as the depth in the binary tree,
we can express the sum in terms of d T , the
depth of character c in the tree - This is the sum we want to minimize. We'll call
it the cost, B(T) of the tree. Now we just need
an algorithm that will build a tree with minimal
cost.
70Huffman Pseudocode
- Huffman (C)
- n the size of C
- insert all the elements of C into Q,
- using the value of the node as the priority
- for i in 1..n-1 do
- z a new tree node
- x Extract-Minimum (Q)
- y Extract-Minimum (Q)
- left node of z x
- right node of z y
- fz fx fy
- Insert (Q, z)
- end for
- return Extract-Minimum (Q) as the complete tree
71Proving Optimality
- Greedy-choice property
- building an optimal tree can begin by merging two
lowest-frequency characters - Optimal-substructure property
Optimal
Optimal
72Greedy-Choice Property (1/2)
- Let x and y be two characters w/ lowest freq.
- Prove that there exists an optimal-code tree
where x and y appear as sibling leaves of max.
depth in the tree.
73Greedy-Choice Property (2/2)
T
T
T
Y
C
Y
B
B
X
C
X
Y
X
C
B
74Optimal-Substructure Property
T
T
C
C
f(c) f(x) f(y)
Y
X
75Conclusion
- Greedy algorithm is
- simple
- easy to invent
- easy to implement
- efficient
- Do not always yield optimal solutions
- require greedy-choice and optimal-substructure
properties for optimality