Title: CS222 Algorithms First Semester 2003/2004
1CS222 AlgorithmsFirst Semester 2003/2004
- Dr. Sanath Jayasena
- Dept. of Computer Science Eng.
- University of Moratuwa
- Lecture 7 (28/10/2003)
- String Matching Part 2
- Greedy Approach
2Overview
- Previous lecture String Matching Part 1
- Naïve Algorithm, Rabin-Karp Algorithm
- This lecture
- String Matching Part 2
- String Matching using Finite Automata
- Knuth-Morris-Pratt (KMP) Algorithm
- Greedy Approach to Algorithm Design
3String Matching
4Finite Automata
- A finite automaton M is a 5-tuple (Q, q0, A, ?,
d), where - Q is a finite set of states
- q0 e Q is the start state
- A ? Q is a set of accepting states
- ? is a finite input alphabet
- d is the transition function that gives the next
state for a given current state and input
5How a Finite Automaton Works
- The finite automaton M begins in state q0
- Reads characters from ? one at a time
- If M is in state q and reads input character a, M
moves to state d(q,a) - If its current state q is in A, M is said to have
accepted the string read so far - An input string that is not accepted is said to
be rejected
6Example
- Q 0,1, q0 0, A1, ? a, b
- d(q,a) shown in the transition table/diagram
- This accepts strings that end in an odd number of
as e.g., abbaaa is accepted, aa is rejected
a
input
a
b
state
1
0
0
0
1
b
0
0
1
a
transition table
b
transition diagram
7String-Matching Automata
- Given the pattern P 1..m, build a finite
automaton M - The state set is Q0, 1, 2, , m
- The start state is 0
- The only accepting state is m
- Time to build M can be large if ? is large
8String-Matching Automata contd
- Scan the text string T 1..n to find all
occurrences of the pattern P 1..m - String matching is efficient T(n)
- Each character is examined exactly once
- Constant time for each character
- But time to compute d is O(m ?)
- d Has O(m ? ) entries
9Algorithm
- Input Text string T 1..n, d and m
- Result All valid shifts displayed
- FINITE-AUTOMATON-MATCHER (T, m, d)
- n ? lengthT
- q ? 0
- for i ? 1 to n
- q ? d (q, T i)
- if q m
- print pattern occurs with shift i-m
10Knuth-Morris-Pratt (KMP) Method
- Avoids computing d (transition function)
- Instead computes a prefix function p in O(m) time
- p has only m entries
- Prefix function stores info about how the pattern
matches against shifts of itself - Can avoid testing useless shifts
11Terminology/Notations
- String w is a prefix of string x, if xwy for
some string y (e.g., srilan of srilanka) - String w is a suffix of string x, if xyw for
some string y (e.g., anka of srilanka) - The k-character prefix of the pattern P
1..m denoted by Pk - E.g., P0 e, Pm P P 1..m
12Prefix Function for a Pattern
- Given that pattern prefix P 1..q matches text
characters T (s1)..(sq), what is the least
shift s gt s such that - P 1..k T (s1)..(sk) where sksq?
- At the new shift s, no need to compare the first
k characters of P with corresponding characters
of T - Since we know that they match
13Prefix Function Example 1
b
a
c
b
a
b
a
b
a
a
b
c
b
a
T
s
a
b
a
b
a
c
a
P
q
b
a
c
b
a
b
a
b
a
a
b
c
b
a
T
s
a
b
a
b
a
c
a
P
k
a
b
a
b
a
Pq
Compare pattern against itself longest prefix of
P that is also a suffix of P5 is P3 so p5 3
Pk
a
b
a
14Prefix Function Example 2
i 1 2 3 4 5 6 7 8 9 10
P i a b a b a b a b c a
pi 0 0 1 2 3 4 5 6 0 1
15Knuth-Morris-Pratt (KMP) Algorithm
- Information stored in prefix function
- Can speed up both the naïve algorithm and the
finite-automaton matcher - KMP Algorithm on the board
- 2 parts KMP-MATCHER, PREFIX
- Running time
- PREFIX takes O(m)
- KMP-MATCHER takes O(mn)
16Greedy Approach to Algorithm Design
17Introduction
- Greedy methods typically apply to optimization
problems in which a set of choices must be made
to arrive at an optimal solution - Optimization problem
- There can be many solutions
- Each solution has a value
- We wish to find a solution with the optimal
(minimum or maximum) value
18Example Optimization Problems
- How to give a balance in minimum number of coins?
- How to allocate resources to maximize profit from
your business? - A thief has a knapsack of capacity c what items
to put in it to maximize profit? - 0-1 knapsack problem (binary choice)
- Fractional knapsack problem
19Greedy Approach
- Make each choice in a locally optimal manner
- Always makes the choice that looks best at the
moment - We hope that this will lead to a globally optimal
solution - Greedy method doesnt always give optimal
solutions, but for many problems it does
20Example
- A cashier gives change using coins of Rs.10, 5, 2
and 1 - Suppose the amount is Rs. 37
- Need to minimize the number of coins
- Try to use the largest coin to cover the
remaining balance - So, we get 10 10 10 5 2
- Does this give the optimal solution?
21Elements of Greedy Approach
- Greedy-choice property
- A globally optimal solution can be arrived at by
making a locally optimal (greedy) choice - Proving this may not be trivial
- Optimal substructure
- Optimal solution to the problem contains within
it optimal solutions to subproblems
22Applications of Greedy Approach
- Graph algorithms
- Minimum spanning tree
- Shortest path
- Data compression
- Huffman coding
- Activity selection (scheduling) problems
- Fractional knapsack problem
- Not the 0-1 knapsack problem
23Announcements
- Assignment 4
- assigned today
- due next week
- Next 2 lectures
- Topic Graphs
- By Ms Sudanthi Wijewickrema