A Fast String Matching Algorithm - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

A Fast String Matching Algorithm

Description:

Knuth-Pratt-Morris Algoritm Linear ... Times New Roman SimSun BatangChe Wingdings Cactus A Fast String Matching Algorithm The obvious search algorithm Knuth-Pratt ... – PowerPoint PPT presentation

Number of Views:146
Avg rating:3.0/5.0
Slides: 20
Provided by: banyanCm9
Category:

less

Transcript and Presenter's Notes

Title: A Fast String Matching Algorithm


1
A Fast String Matching Algorithm
  • The Boyer Moore Algorithm

2
The obvious search algorithm
  • Considers each character position of str and
    determines whether the successive patlen
    characters of str matches pat.
  • In worst case, the number of comparisons is in
    the order of ipatlen.
  • Ex. pat aab str ..aaaaac .

3
Knuth-Pratt-Morris Algoritm
  • Linear search algorithm.
  • Preprocesses pat in time linear in patlen and
    searches str in time linear in ipatlen.
  • EXAMPLE
  • HERE IS A SIMPLE EXAMPLE


EXAMPLE
EXAMPLE
EXAMPLE
4
Characteristics of Boyer Moore Algorithm
  • Basic idea string matches the pattern from the
    right rather than from the left.
  • Expected value c( i patlen ), clt1
  • Preprocessing pat and compute two tables delta1
    delta2 for shifting pat the pointer of str.
  • Ex. pat AT-THAT str WHICH-FINALLY-HALTS
    .AT-THAT-POINT

5
Informal Description
  • Compare the last char of the pat with the
    patlenth char of str
  • AT-THAT
  • WHICH-FINALLY-HALTS.AT-THAT-POINT
  • Observation 1 char is not to occur in pat, skip
    patlen( delta1(F) ) chars of str.

AT-THAT
6
Informal Description
  • Observation 2 char is in pat, slide pat down
    delta1(-) positions so that char is aligned to
    the corresponding character in pat.
  • delta1(char) if char not occur in pat,then
    patlen else patlen j , where j is the maximum
    integer such that pat(j)char.

AT-THAT WHICH-FINALLY-HALTS.--AT-THAT-P
OINT
7
Informal Description
  • Observation 3a str matches the last m chars of
    pat, and came to a mismatch at some new char.
    Move strptr by delta1(L).(pat shifted by
    delta1(L)-m)
  • AT-THAT
  • FINALLY-HALTS.--AT-THAT-POINT

AT-THAT
8
Informal Description
  • Observation 3b the final m chars of pat (a
    subpat) is matched, find the right most plausible
    reoccurrence of the subpat, align it with the
    matched m chars of str (slide pat delta2(-)
    positions).
  • AT-THAT
  • FINALLY-HALTS.AT-THAT-POINT

AT-THAT
AT-THAT
9
The delta1 delta2 tables
  • The delta1 table has as many entries as there are
    chars in the alphabet.
  • Ex. pat a b c d e a t t h a t
  • delta1 4 3 2 1 0 else,5 1 0 4 0 2 1 0
    else,7
  • The delta2 table has as many entries as there are
    chars in pat.
  • delta2( j ) ( j 1- rpr(j) ) (patlen j)
    patlen 1 - rpr(j)
  • Ex. pat a b c d e a t - t h a t
  • delta2 9 8 7 6 1 11 10 9 8 7 8 1

10
The algorithm
  • stringlen length of string.
  • i patlen.
  • top if i gt stringlen then return false.
  • j patlen.
  • loop if j0 then return i1.
  • if string(i)pat(j)
  • then
  • j j-1
  • i i-1
  • goto loop.
  • close
  • i i max( delta1(sting(i)) , delta2(j))
  • goto top.

11
(No Transcript)
12
Performance (empirical evidence)
13
The Implementation in mstring.c
  • Function make_skip(char, int)
  • Purpose create the skip(delta 1) table
  • Function inputs char ptrn, int plen
  • Local variables int skip, sptr
  • Return int skip
  • Function make_shift(char, int)
  • Purpose create the shift(delta2) table
  • Function inputs charptrn, int plen
  • Local variables int shift, sptr char pptr, c
  • Return int shift

14
Flowchart of make_skip()
Return skip
true
  • Allocate
  • memory
  • to skip

skipplen1
plen0?
false
skipptrnplen--
15
make_skip()
  • int make_skip(char ptrn, int plen)
  • int skip (int ) malloc(256
    sizeof(int))
  • int sptr skip256
  • if (skip NULL)
  • FatalPrintError("malloc")
  • while(sptr-- ! skip)
  • sptr plen 1
  • while(plen ! 0)
  • skip(unsigned char) ptrn plen--
  • return skip

16
  • Allocate memory to shift

Procedures of make_shift()
cptrnplen-1
Look for rpr of c
Look for two identical subpat
Assign values to shift
Return shift
17
make_shift()
  • int shift (int ) malloc(plen
    sizeof(int))
  • int sptr shift plen - 1
  • char pptr ptrn plen - 1
  • char c
  • if (shift NULL)
  • FatalPrintError("malloc")
  • c ptrnplen - 1
  • sptr 1

18
make_shift()
  • while(sptr-- ! shift)
  • char p1 ptrn plen - 2, p2, p3
  • do
  • while(p1 gt ptrn p1-- ! c)
  • p2 ptrn plen - 2
  • p3 p1
  • while(p3 gt ptrn p3-- p2--
    p2 gt pptr)
  • while(p3 gt ptrn p2 gt pptr) //
    p2gtj,p3gt1
  • sptr shift plen - sptr p2 - p3
  • pptr--
  • return shift

19
  • Exj5
  • j 1 2 3 4 5 6 7
  • Pat e d b c a b c
  • step1 p1
  • step2 p3 p2
  • syep3 p3 p2
  • ? delta2( j ) (p2-p3) (plen j) 5
Write a Comment
User Comments (0)
About PowerShow.com