Design and Analysis of Computer Algorithm Lecture 5-3

About This Presentation

Title:

Design and Analysis of Computer Algorithm Lecture 5-3

Description:

Design and Analysis of Computer Algorithm. 1. Design and Analysis of ... This lecture note has been modified from lecture note for 23250 by Prof. Francis ... – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 76

Provided by: davidl67

Category:

more less

Transcript and Presenter's Notes

Title: Design and Analysis of Computer Algorithm Lecture 5-3

1
Design and Analysis of Computer AlgorithmLecture
5-3

Pradondet Nilagupta
Department of Computer Engineering

This lecture note has been modified from lecture
note for 23250 by Prof. Francis Chin , CS332 by
David Luekbe
2
Greedy Method (Cont.)
3
Disjoint-Set Union Problem

Want a data structure to support disjoint sets
Collection of disjoint sets S Si, Si n Sj ?
Need to support following operations
MakeSet(x) S S U x
Union(Si, Sj) S S - Si, Sj U Si U Sj
FindSet(X) return Si ? S such that x ? Si
Before discussing implementation details, we look
at example application MSTs

4
Kruskals Algorithm

Kruskal()
T ?
for each v ? V
MakeSet(v)
sort E by increasing edge weight w
for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v)
T T U u,v
Union(FindSet(u), FindSet(v))

5
Kruskals Algorithm
Run the algorithm

Kruskal()
T ?
for each v ? V
MakeSet(v)
sort E by increasing edge weight w
for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v)
T T U u,v
Union(FindSet(u), FindSet(v))

2
19
9
17
14
25
8
5
1
13
21
6
Kruskals Algorithm
Run the algorithm

Kruskal()
T ?
for each v ? V
MakeSet(v)
sort E by increasing edge weight w
for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v)
T T U u,v
Union(FindSet(u), FindSet(v))

2
19
9
17
14
25
8
5
1
13
21
7
Kruskals Algorithm
Run the algorithm

Kruskal()
T ?
for each v ? V
MakeSet(v)
sort E by increasing edge weight w
for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v)
T T U u,v
Union(FindSet(u), FindSet(v))

2
19
9
17
14
25
8
5
1
13
21
8
Kruskals Algorithm
Run the algorithm

Kruskal()
T ?
for each v ? V
MakeSet(v)
sort E by increasing edge weight w
for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v)
T T U u,v
Union(FindSet(u), FindSet(v))

2
19
9
17
14
25
8
5
1?
13
21
9
Kruskals Algorithm
Run the algorithm

Kruskal()
T ?
for each v ? V
MakeSet(v)
sort E by increasing edge weight w
for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v)
T T U u,v
Union(FindSet(u), FindSet(v))

2
19
9
17
14
25
8
5
1
13
21
10
Kruskals Algorithm
Run the algorithm

Kruskal()
T ?
for each v ? V
MakeSet(v)
sort E by increasing edge weight w
for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v)
T T U u,v
Union(FindSet(u), FindSet(v))

2?
19
9
17
14
25
8
5
1
13
21
11
Kruskals Algorithm
Run the algorithm

Kruskal()
T ?
for each v ? V
MakeSet(v)
sort E by increasing edge weight w
for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v)
T T U u,v
Union(FindSet(u), FindSet(v))

2
19
9
17
14
25
8
5
1
13
21
12
Kruskals Algorithm
Run the algorithm

Kruskal()
T ?
for each v ? V
MakeSet(v)
sort E by increasing edge weight w
for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v)
T T U u,v
Union(FindSet(u), FindSet(v))

2
19
9
17
14
25
8
5?
1
13
21
13
Kruskals Algorithm
Run the algorithm

Kruskal()
T ?
for each v ? V
MakeSet(v)
sort E by increasing edge weight w
for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v)
T T U u,v
Union(FindSet(u), FindSet(v))

2
19
9
17
14
25
8
5
1
13
21
14
Kruskals Algorithm
Run the algorithm

Kruskal()
T ?
for each v ? V
MakeSet(v)
sort E by increasing edge weight w
for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v)
T T U u,v
Union(FindSet(u), FindSet(v))

2
19
9
17
14
25
8?
5
1
13
21
15
Kruskals Algorithm
Run the algorithm

Kruskal()
T ?
for each v ? V
MakeSet(v)
sort E by increasing edge weight w
for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v)
T T U u,v
Union(FindSet(u), FindSet(v))

2
19
9
17
14
25
8
5
1
13
21
16
Kruskals Algorithm
Run the algorithm

Kruskal()
T ?
for each v ? V
MakeSet(v)
sort E by increasing edge weight w
for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v)
T T U u,v
Union(FindSet(u), FindSet(v))

2
19
9?
17
14
25
8
5
1
13
21
17
Kruskals Algorithm
Run the algorithm

Kruskal()
T ?
for each v ? V
MakeSet(v)
sort E by increasing edge weight w
for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v)
T T U u,v
Union(FindSet(u), FindSet(v))

2
19
9
17
14
25
8
5
1
13
21
18
Kruskals Algorithm
Run the algorithm

Kruskal()
T ?
for each v ? V
MakeSet(v)
sort E by increasing edge weight w
for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v)
T T U u,v
Union(FindSet(u), FindSet(v))

2
19
9
17
14
25
8
5
1
13?
21
19
Kruskals Algorithm
Run the algorithm

Kruskal()
T ?
for each v ? V
MakeSet(v)
sort E by increasing edge weight w
for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v)
T T U u,v
Union(FindSet(u), FindSet(v))

2
19
9
17
14
25
8
5
1
13
21
20
Kruskals Algorithm
Run the algorithm

Kruskal()
T ?
for each v ? V
MakeSet(v)
sort E by increasing edge weight w
for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v)
T T U u,v
Union(FindSet(u), FindSet(v))

2
19
9
17
14?
25
8
5
1
13
21
21
Kruskals Algorithm
Run the algorithm

Kruskal()
T ?
for each v ? V
MakeSet(v)
sort E by increasing edge weight w
for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v)
T T U u,v
Union(FindSet(u), FindSet(v))

2
19
9
17
14
25
8
5
1
13
21
22
Kruskals Algorithm
Run the algorithm

Kruskal()
T ?
for each v ? V
MakeSet(v)
sort E by increasing edge weight w
for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v)
T T U u,v
Union(FindSet(u), FindSet(v))

2
19
9
17?
14
25
8
5
1
13
21
23
Kruskals Algorithm
Run the algorithm

Kruskal()
T ?
for each v ? V
MakeSet(v)
sort E by increasing edge weight w
for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v)
T T U u,v
Union(FindSet(u), FindSet(v))

2
19?
9
17
14
25
8
5
1
13
21
24
Kruskals Algorithm
Run the algorithm

Kruskal()
T ?
for each v ? V
MakeSet(v)
sort E by increasing edge weight w
for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v)
T T U u,v
Union(FindSet(u), FindSet(v))

2
19
9
17
14
25
8
5
1
13
21?
25
Kruskals Algorithm
Run the algorithm

Kruskal()
T ?
for each v ? V
MakeSet(v)
sort E by increasing edge weight w
for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v)
T T U u,v
Union(FindSet(u), FindSet(v))

2
19
9
17
14
25?
8
5
1
13
21
26
Kruskals Algorithm
Run the algorithm

Kruskal()
T ?
for each v ? V
MakeSet(v)
sort E by increasing edge weight w
for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v)
T T U u,v
Union(FindSet(u), FindSet(v))

2
19
9
17
14
25
8
5
1
13
21
27
Kruskals Algorithm
Run the algorithm

Kruskal()
T ?
for each v ? V
MakeSet(v)
sort E by increasing edge weight w
for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v)
T T U u,v
Union(FindSet(u), FindSet(v))

2
19
9
17
14
25
8
5
1
13
21
28
Correctness Of Kruskals Algorithm

Sketch of a proof that this algorithm produces an
MST for T
Assume algorithm is wrong result is not an MST
Then algorithm adds a wrong edge at some point
If it adds a wrong edge, there must be a lower
weight edge (cut and paste argument)
But algorithm chooses lowest weight edge at each
step. Contradiction
Again, important to be comfortable with cut and
paste arguments

29
Kruskals Algorithm
What will affect the running time?
Kruskal() T ? for each v ? V
MakeSet(v) sort E by increasing edge weight
w for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v) T T U
u,v Union(FindSet(u), FindSet(v))
30
Kruskals Algorithm
What will affect the running time? 1 Sort O(V)
MakeSet() calls O(E) FindSet() callsO(V) Union()
calls (Exactly how many Union()s?)
Kruskal() T ? for each v ? V
MakeSet(v) sort E by increasing edge weight
w for each (u,v) ? E (in sorted order)
if FindSet(u) ? FindSet(v) T T U
u,v Union(FindSet(u), FindSet(v))
31
Kruskals Algorithm Running Time

To summarize
Sort edges O(E lg E)
O(V) MakeSet()s
O(E) FindSet()s
O(V) Union()s
Upshot
Best disjoint-set union algorithm makes above 3
operations take O(E??(E,V)), ? almost constant
Overall thus O(E lg E), almost linear w/o sorting

32
Disjoint Set Union

So how do we implement disjoint-set union?
Naïve implementation use a linked list to
represent each set
MakeSet() ??? time
FindSet() ??? time
Union(A,B) copy elements of A into B ??? time

33
Disjoint Set Union

So how do we implement disjoint-set union?
Naïve implementation use a linked list to
represent each set
MakeSet() O(1) time
FindSet() O(1) time
Union(A,B) copy elements of A into B O(A)
time
How long can a single Union() take?
How long will n Union()s take?

34
Disjoint Set Union Analysis

Worst-case analysis O(n2) time for n Unions
Union(S1, S2) copy 1 element
Union(S2, S3) copy 2 elements
Union(Sn-1, Sn) copy n-1 elements
O(n2)
Improvement always copy smaller into larger
Why will this make things better?
What is the worst-case time of Union()?
But now n Unions take only O(n lg n) time!

35
Huffman Code
36
Data Compression

Motivation
Limited network bandwidth.
Limited disk space.
Huffman coding
Variable length coding
Shorter codes are used to encode characters that
occur frequently.

37
Fixed-length code

Each symbol is encoded using the same number of
bits.
C symbols
?log2 C? bits
Simple, easy to work with
TEST
100001011100

38
Example Fixed-length code
39
Code Trees
000 001 010 011 100 101 110
40
Code Trees
TASTE 11 01 00000 11 10
1101000001110
41
(No Transcript)
42
Code Trees
P
A
E
T
sp

I
S
nl
Wrong
1101000001110 11 01 00 00 01 11 0 or
11 01 00000 11 10
43
Code Trees

Characters are placed only at the leaves.
No character code is a prefix of another
character code.
Prefix code
A sequence of bits can always be decoded
unambiguously.
Full Trees
All nodes are leaves or have two children.

44
Problem Description

Input
A list of symbols and their frequencies
Output
a code tree with minimum total cost
Total cost
ls number of bits of the code for symbol s
fs frequency of symbol s
Algorithm
Huffmans algorithm

45
Huffmans algorithm

Maintain a forest of trees.
weight of a tree sum of the frequencies of its
leaves
Initially, there are C single-node treesone for
each character.
Select two trees, T1 and T2, of smallest weights,
breaking ties arbitrarily, and form a new tree
with T1 and T2 as subtrees.
Repeat the previous step until there is only 1
tree. The tree is the optimal Huffman coding
tree.

46
(No Transcript)
47
41
39
67
A
T3
T4
T
sp
E
T2
T1
I
S
nl
48
(No Transcript)
49
Different Optimal Code Trees
50
Exercise (1/2)
areleeelaeelarrn..
51
Exercise (2/2)
52
Maintaining the Trees

Operations
C initial inserts
2(C-1) deletes and C-1 inserts
Maintain a sorted list using linked list
O(C) per insert, O(1) per delete
O(C2) total running time
Use Priority Queues
O(log C) per insert, O(log C) per delete
O(C log C) total running time

53
Code Table

How to store the code table?
Use some kind of array
Problem codes have different lengths.

54
Parenthesized Infix Expression

E x symb (E,symb,E)
symb a b c d e o
(((a,o,b),o,c),o,(d,o,e))

55
Prefix expression

E symb E E
symb a d m k f
admkf

prefix 000111011
leaves admkf

56
Encoding the code tree

Let interior nodes contain the symbol
Construct the prefix expression E
admkf
Remove s from E
admkf
In E, replace s with 0,
others with 1
000111011

57
Encoding the code tree

Tree Structure
0 for interior nodes
1 for leaf nodes
encoded as prefix expression
000111011
Information in leaf nodes
Array
All entries have the same size.
a d m k f

58
More Example Huffman codes

If we used a variable number of bits for a code
such that frequent characters use fewer bits and
infrequent character use more bits, we can
decrease the space needed to store the same
information. For example, consider the following
sentence
dead beef cafe deeded dad. dad faced a faded
cab. dad acceded. dad be bad.
There are 12 a's, 4 b's, 5 c's, 19 d's, 12
e's, 4 f's, 17 spaces, and 4 periods, for a total
of 77 characters.

59
fixed-length code

If we use a fixed-length code like this
000 (space)
001 a
010 b
011 c
100 d
101 e
110 f
111 .

Then the sentence, which is of length 77,
consumes 77 3 231 bits.
60
variable length code

if we use a variable length code like this
100 (space)
110 a
11110 b
1110 c
0 d
1010 e
11111 f
1011 .

we can encode the text in 3 12 4 5 5 4
19 1 12 4 4 5 17 3 4 4 230
bits. That a savings of 1 bit.
61
Binary Code

Suppose that we have a large amount of text that
we wish to store on a computer disk in an
efficient way. The simplest way to do this is
simply to assign a binary code to each character,
and then store the binary codes consecutively in
the computer memory.
The ASCII system for example, uses a fixed 8-bit
code to represent each character. Storing n
characters as ASCII text requires 8n bits of
memory.

62
Binary Code (cont.)

suppose that we are storing only the 10 numeric
characters 0 , 1 , . . . ,9 .

63
Non-random data

Consider the following data, which is taken from
a Postscript file.

64
A good code

What would happen if we used the following code
to store the data rather than the fixed length
code?

65
Example

To store the string 0748901
we would get 0000011101001000100100000001
using the fixed length code and
10110001010000111011010 using the variable length
code.

66
Prefix codes

a code in which no codeword is a prefix of any
other codeword. Decoding such a code is done
using a binary tree.

0
1
0
0
1
1
0
1
0
1
1
8
0
0
1
3
7
0
1
1
0
4
6
2
0
1
9
5
67
Optimal trees

A tree representing an optimal code for a file is
always a full binary tree namely, one where
every node is either a leaf or has precisely two
children.

68
Why does it work?

In order to show that Huffman's algorithm works,
we must show that there can be no prefix codes
that are better than the one produced by
Huffman's algorithm.

69
Huffman Codes

Let T be the tree, C be the set of characters c
that comprise the alphabet, and f(c) be the
frequency of character c. Since the number of
bits is the same as the depth in the binary tree,
we can express the sum in terms of d T , the
depth of character c in the tree
This is the sum we want to minimize. We'll call
it the cost, B(T) of the tree. Now we just need
an algorithm that will build a tree with minimal
cost.

70
Huffman Pseudocode

Huffman (C)
n the size of C
insert all the elements of C into Q,
using the value of the node as the priority
for i in 1..n-1 do
z a new tree node
x Extract-Minimum (Q)
y Extract-Minimum (Q)
left node of z x
right node of z y
fz fx fy
Insert (Q, z)
end for
return Extract-Minimum (Q) as the complete tree

71
Proving Optimality

Greedy-choice property
building an optimal tree can begin by merging two
lowest-frequency characters
Optimal-substructure property

Optimal
Optimal
72
Greedy-Choice Property (1/2)

Let x and y be two characters w/ lowest freq.
Prove that there exists an optimal-code tree
where x and y appear as sibling leaves of max.
depth in the tree.