Boyer-Moore String Searching Algorithm - PowerPoint PPT Presentation

About This Presentation

Title:

Boyer-Moore String Searching Algorithm

Description:

Boyer-Moore String Searching Algorithm By: Matthew Brown ... (mn) Rabin-Karp string search algorithm: [average O(n+m)] (n = length of search string, ... – PowerPoint PPT presentation

Number of Views:80

Avg rating:3.0/5.0

Slides: 13

Provided by: RustySha7

Learn more at: https://www.cs.utexas.edu

Category:

more less

Transcript and Presenter's Notes

Title: Boyer-Moore String Searching Algorithm

1
Boyer-Moore String Searching Algorithm

By Matthew Brown

2
String-Searching Algorithms

The goal of any string-searching algorithm is to
determine whether or not a match of a particular
string exists within another (typically much
longer) string.
Many such algorithms exist, with varying
efficiencies.
String-searching algorithms are important to a
number of fields, including computational
biology, computer science, and mathematics.

3
The Boyer-Moore String Search Algorithm

Developed in 1977, the B-M string search
algorithm is a particularly efficient algorithm,
and has served as a standard benchmark for string
search algorithm ever since.
This algorithms execution time can be
sub-linear, as not every character of the string
to be searched needs to be checked.
Generally speaking, the algorithm gets faster as
the target string becomes larger.

4
How does it work?

The B-M algorithm takes a backward approach
the target string is aligned with the start of
the check string, and the last character of the
target string is checked against the
corresponding character in the check string.
In the case of a match, then the second-to-last
character of the target string is compared to the
corresponding check string character. (No gain
in efficiency over brute-force method)
In the case of a mismatch, the algorithm computes
a new alignment for the target string based on
the mismatch. This is where the algorithm gains
considerable efficiency.

5
An example

Target string rockstar
Check string -------x-----
Aligning the start of each string pairs r with
x.
Since x is not a character in rockstar, it
makes no sense to check alignments beginning with
any character in the check string which comes
before x, and the B-M algorithm skips all such
alignments.
This eliminates several (7, in this case)
alignments to be checked by the algorithm, and we
needed to compare only two characters.

6
Efficiency of the B-M Algorithm

The average-case performance of the B-M
algorithm, for a target string of length M and
check string of length N, is N/M.
In the best case, only one in M characters needs
to be checked.
In the worst case, 3N comparisons need to be
made, leading to a complexity of O(n), regardless
of whether or not a match exists.

7
Pre-processing Tables

The B-M algorithm computes 2 preprocessing tables
to determine the next suitable alignment after
each failed verification.
The first table calculates how many positions
ahead of the current position to start the next
search (based on character which caused failed
verification).
The second table makes a similar calculation
based on how many characters were matched
successfully before a failed verification
These tables are often referred to as jump
tables, though this leads to some ambiguity with
the more common meaning of the term in computer
science, which refers to an efficient way of
transferring control from one part of a program
to another.

8
Calculation of Preprocessing Tables

Table 1
Starting at the last character of the target
string, move left toward the first character. At
each character, if the character is not already
in the table, add it to the table.
This characters shift value is equal to its
distance from the right-most character in the
string.
All other characters receive a shift value equal
to the total length of the string.
Example peterpan would produce the following
table (character, shift) (A, 1), (P, 2), (R,
3), (E, 4),
(T, 5), (all other characters, 8)

9
Calculation of Preprocessing Tables

Table 2
First, for each value of i less than the length
of the target string, calculate the pattern of
the last i characters of the target string
preceded by a mis-match for the character before
it.
Then, determine the least number of characters of
the partial pattern that must be shifted left
before two patterns match.
Example for ANPANMAN, the table would be (I,
pattern, shift) (0, -N, 1), (1, (-A)N, 8), (2,
(-M)AN, 3), (3, (-N)MAN, 6), (4, (-A)NMAN, 6),
(5, (-P)ANMAN, 6), (6, (-N)PANMAN, 6), (7,
(-A)NPANMAN, 6). (here, -X means not X)

10
Comparison of String Searching Algorithm
Complexities

Boyer-Moore O(n)
Naïve string search algorithm O((n-m1)m)
Bitap Algorithm O(mn)
Rabin-Karp string search algorithm average
O(nm)
(n length of search string, m length of
target string)

11
About the Creators

Robert Boyer is a retired Professor Emeritus of
the University of Texas at Austin Computer
Science Department. He received his BA and PhD
in mathematics at UT Austin, and has authored and
co-authored several books concerning automatic
theorem-proving.

J. Strother Moore is Admiral B.R. Inman
Centennial Chair in Computer Theory of the
Department of Computer Sciences at UT Austin. He
received his BS in mathematics from MIT in 1970,
and his PhD in computational logic from the
University of Edinburgh in 1973. He has authored
and co-authored several books concerning
automatic theorem-proving, some of them in
cooperation with Robert Boyer.
12
References