Title: The Simplest NL Applications: Text Searching and Pattern Matching
1The Simplest NL Applications Text Searching and
Pattern Matching
Read J M Chapter 2
2Searching for a Single StringUsing a
Nondeterministic FSM
c o c o
n u t
1 2 3 4 5
6 7 8
?
?
3Searching for a Single String Using the Boyer
Moore Algorithm
4Searching for Multiple Strings
?
o c o s
2 3 4 5
6
l
c o c o
n u t
1 2 3 4 5
6 7 8
?
?
Example lococonut
5Converting to a Deterministic FSM
?
o c o s
2 3 4 5
6
l
c o c o
n u t
1 2 3 4 5
6 7 8
?
?
6Regular Expressions
- Two different (but related) uses of the term
- Expressions that define all and only the regular
languages - (aa?? ab ? ba ? bb)
- Expressions in a useful pattern language
Matching ip addresses S!ltemphasisgt (0-9 (\ .
0-9) 3) lt/emphasisgt ! ltinetgt 1
lt/inetgt! Finding doubled words \lt (A-Za-z)
\s \1 \gt
7REs Syntax and Semantics
Syntax The regular expressions over an alphabet ?
are all strings over the alphabet ? ? (, ), ?,
?, that can be obtained as follows 1. ? and
each member of ? is a regular expression. 2. If ?
, ? are regular expressions, then so is ??. 3. If
? , ? are regular expressions, then so is ???. 4.
If ? is a regular expression, then so is ?. 5.
If ? is a regular expression, then so is (?). 6.
Nothing else is a regular expression.
8REs Syntax and Semantics
Regular expressions define languages via a
semantic interpretation function we'll call L 1.
L(?) ? and L(a) a for each a ? ? 2. If ? ,
? are regular expressions, then L(??) L(?) L(?)
all strings that can be formed by
concatenating to some string from L(?) some
string from L(?). 3. If ? , ? are regular
expressions, then L(???) L(?) ? L(?) 4. If ? is
a regular expression, then L(?) L(?) 5. If
(?) is a regular expression, then L( (?) )
L(?) A language is regular if and only if it can
be described by a regular expression. Note L is
compositional.
9The Importance of Compositionality
What is the meaning of Mary cooked the
yujutes. Mary tyroked the yujutes.
10Morphological Analysis
- Read J M Chapter 3
- Recognize words
11Morphological Parsing
Goal to represent the facts declaratively so
that a single representation can be used for both
recognition and generation.
Note marks morpheme boundaries. marks word
boundaries.
12From Lexical to Intermediate
Note All the transducers in the book are
described as lexicalintermediate, but they can
run the other direction.
13Where Did reg-noun-stem Come From?
14We Can Cascade or Compose
15From Intermediate to Surface
For text, we need spelling rules.
x ? ? e / s ___ s
z
Read this as Replace ? as e in the context after
the /.
16Turning the Rule into a Transducer
foxes xerox foxsat
17Disambiguation - Local
Local ambiguities
s
asses
luxury
18Disambiguation - Harder
Sometimes additional knowledge is necessary
foxes fox N PL or fox V SG
Can we think of nouns that cannot also be verbs?
19Search
- For FSMs, we can build a deterministic machine.
- In other cases, we will have to search
- Depth-first
- Breadth-first chart parsing
S S
VP VP
NP
PP
NP NP V
V PR N det
N PREP DET N I hit the
boy with a bat.