CPSC 503 Computational Linguistics - PowerPoint PPT Presentation

About This Presentation
Title:

CPSC 503 Computational Linguistics

Description:

Transitions (as a translator) ... Translators: input one string from I, output another from ... FST-1 translates between the lexical and the intermediate level ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 47
Provided by: gcare
Category:

less

Transcript and Presenter's Notes

Title: CPSC 503 Computational Linguistics


1
CPSC 503Computational Linguistics
  • Lecture 4
  • Giuseppe Carenini

2
Today 1/23
  • Finite State Transducers (FSTs) and Morphological
    Parsing
  • Stemming (Porter Stemmer)

3
Computational problems in Morphology
  • Recognition recognize whether a string is an
    English word (FSA)
  • Parsing/Generation

stem, class, lexical features
.
word
.
lie N PL
e.g.,
lies
lie V 3SG
  • Stemming

stem
word
.
4
Finite State Transducers (FSTs)
  • FSA cannot help
  • Need to extend FSA
  • Add another tape
  • Add extra symbols to the transitions
  • On one tape we read cats, on the other we write
    cat N PL (or vice versa)

5
FSTs as translators
generation
parsing
6
Example
PLs
ll
ii
ee
Ne
q1
q0
q2
q3
q4
q6
q5
q7
Ve
3SGs
  • Transitions (as a translator)
  • ll means read a l on one tape and write a l on
    the other (or vice versa)
  • Ne means read a N symbol on one tape and write
    nothing on the other (or vice versa)
  • PLs means read PL and write an s (or vice
    versa)

7
Examples (as a translator)
lexical
l
i
e
s
surface
lexical
V
3SG
l
i
e
surface
8
Examples (as a recognizer and a generator)
V
3SG
l
i
e
lexical
l
i
e
s
surface
lexical
surface
9
FST definition
  • Q a finite set of states
  • I,O input and an output alphabets (which may
    include e)
  • S a finite alphabet of complex symbols io, i?I
    and o?O
  • Q0 the start state
  • F a set of accept/final states (F?Q)
  • A transition relation d that maps QxS to Q

10
FST can be used as
  • Translators input one string from I, output
    another from O (or vice versa)
  • Recognizers input a strings from IxO
  • Generator output a string from IxO

Terminology warning!
11
A step back FSA can represent morphological
knowledge
  • Lexicon list of stem and affixes, together with
    basic information about them
  • Morphotactics the rules governing the ordering
    of morphemes
  • Orthographics rules model changes in morphemes
    when they combine

12
FSA for inflectional morphology of plural
Some regular-nouns
i
Some irregular-nouns
13
FST for inflectional morphology of plural
Some regular-nouns
Some irregular-nouns
oi
14
Examples
lexical
m
i
c
surface
e
lexical
N
PL
c
a
t
surface
15
Problems/Challenges
  • Ambiguity one word can correspond to multiple
    structures
  • Spelling changes may occur when two morphemes
    are combined (inflectionally)
  • e.g. butterfly -s - butterflies

16
Ambiguity
  • ND recognition multiple paths through a machine
    may lead to an accept state (Didnt matter which
    path was actually traversed)
  • In ND parsing the path to an accept state does
    matter differ paths represent different parses
    and different outputs will result

PLs
ll
ii
ee
Ne
q1
q0
q2
q3
q4
q6
q5
q7
Ve
PLs
17
Ambiguity more complex example
  • Whats the right parse for Unionizable?
  • Union-ize-able
  • Un-ion-ize-able
  • Each would represent a valid path through an FST
    for derivational morphology.

18
Deal with Morphological Ambiguity
  • There are a number of ways to deal with this
    problem
  • Simply take the first output found
  • Find all the possible outputs (all paths) and
    return them all (without choosing)
  • Bias the search so that only one or a few likely
    paths are explored

Then Part-of-speech tagging to choose
19
Spelling Changes
  • When morphemes are combined inflectionally the
    spelling at the boundaries may change
  • Examples
  • E-insertion when s is added to a word, -e is
    inserted if word ends in s, -z, -sh, -ch, -x
    (e.g., kiss, miss, waltz, bush, watch, rich, box)
  • Y-replacement when s or -ed are added to a word
    ending with a y, -y changes to ie or i
    respectively (e.g., try, butterfly)

20
Solution Multi-Tape Machines
  • Add intermediate tape
  • Use the output of one tape machine as the input
    to the next
  • Add intermediate symbols
  • morpheme boundary
  • word boundary

21
Multi-Level Tape Machines
FST-1
FST-2
  • FST-1 translates between the lexical and the
    intermediate level
  • FTS-2 handles the spelling changes (due to one
    rule) to the surface tape

22
FST-1 for inflectional morphology of plural
Some regular-nouns
PLs



Some irregular-nouns
oi
es
e
PL
23
Example
lexical
f
o
x
PL
N
intemediate
s
e
m
o
u
lexical
N
PL
intemediate
24
FST-2 for E-insertion(Intermediate to Surface)
  • E-insertion when s is added to a word, -e is
    inserted if word ends in s, -z, -sh, -ch, -x
  • as in foxs foxes

e
25
Examples
intemediate

s
f
o
x

surface
intemediate

i
b
o
x
n
g

surface
26
Where are we?
27
Final Scheme Part 1
28
Final Scheme Part 2
29
Intersection (T1,T2)
  • States of T1 and T2 Q1 and Q2
  • States of intersection Q1 x Q2
  • Transitions of T1 and T2 d1, d2
  • Transitions of intersection d3
  • d3((xa,ya), ic) (xb,yb) iff
  • d1(xa, ic) xb AND
  • d2(ya, ic) yb

30
Composition(T1,T2)
  • States of T1 and T2 Q1 and Q2
  • States of composition Q1 x Q2
  • Transitions of T1 and T2 d1, d2
  • Transitions of composition d3
  • d3((xa,ya), io) (xb,yb) iff
  • There exists c such that
  • d1(xa, ic) xb AND
  • d2(ya, co) yb

31
Other important applications of FTS in NLP
  • Segmentation finding word boundaries in text
    (?!)
  • Shallow syntactic parsing e.g., find only noun
    phrases
  • Dialogue Act Disambiguation right (IUI-04)
  • Phonological Rules.

32
FSTs in Practice
  • Install an FST package (pointers)
  • Describe your formal language (e.g, lexicon,
    morphotactic and rules) in a RegExp like notation
    (pointer)
  • Your specification is compiled in an FST
  • NOTE FSTs for the morphology of a natural
    language may have 105 107 states and arcs

33
Computational problems in Morphology
  • Recognition recognize whether a string is an
    English word (FSA)
  • Parsing/Generation (FST)

stem, class, lexical features
word
.
.
lie N PL
e.g.,
lies
lie V 3SG
  • Stemming

stem
word
.
34
Stemmer
  • E.g. the Porter algorithm (Appendix B), which is
    based on a series of sets of simple cascaded
    rewrite rules
  • ATIONAL ? ATE (relational ? relate)
  • ING ? ? if stem contains vowel (motoring ? motor)
  • Cascade of rules applied to computerization
  • ization - -ize computerize
  • ize - e computer
  • Errors occur
  • organization ? organ, doing ? doe university ?
    universe

35
Stemming mainly used in Information Retrieval
  • Run a stemmer on the documents to be indexed
  • Run a stemmer on users queries
  • Compute similarity between queries and documents
    (based on stems they contain)

36
Porter as an FST
  • The original exposition of the Porter stemmer did
    not describe it as a transducer but
  • Each stage is a separate transducer
  • The stages can be composed to get one big
    transducer

37
Formalisms and associated Algorithms
Linguistic Knowledge
  • State Machines (no prob.)
  • Finite State Automata (and Regular Expressions)
  • Finite State Transducers

(English) Morphology
Syntax
Rule systems (and prob. version) (e.g., (Prob.)
Context-Free Grammars)
Semantics
Logical formalisms (First-Order Logics)
Pragmatics Discourse and Dialogue
AI planners
38
Next Time
  • Intro to probability and information theory
  • On your preferred source read about
  • Conditional probability
  • Bayes rule
  • Independence
  • Entropy
  • Conditional Entropy and Mutual Information

39
Lexical to Intermediate Level
40
FST for inflectional morphology of plural
Some regular-nouns
Some irregular-nouns
41
Foxes
42
FST Review
  • FSTs allow us to take an input and deliver a
    structure based on it
  • Or take a structure and create a surface form
  • Or take a structure and create another structure

43
Formalisms and associated Algorithms
Linguistic Knowledge
  • State Machines (no prob.)
  • Finite State Automata (and Regular Expressions)
  • Finite State Transducers

(English) Morphology
Syntax
Rule systems (and prob. version) (e.g., (Prob.)
Context-Free Grammars)
Semantics
Logical formalisms (First-Order Logics)
Pragmatics Discourse and Dialogue
AI planners
44
Review
  • In many applications its convenient to decompose
    the problem into a set of cascaded transducers
    where
  • The output of one feeds into the input of the
    next.

45
English Spelling Changes
  • We use one machine to transduce between the
    lexical and the intermediate level, and another
    to handle the spelling changes to the surface
    tape

46
FST can be used as
  • Translators input one string (a sequence from
    I), output another one (a sequence from O)or
    viceversa
  • Recognizers input both strings (a sequence from
    IxO)
  • Generator output both strings (a sequence from
    IxO)
Write a Comment
User Comments (0)
About PowerShow.com