Syntax-based Statistical Machine Translation Models - PowerPoint PPT Presentation

About This Presentation

Title:

Syntax-based Statistical Machine Translation Models

Description:

'One naturally wonders if the problem of translation could ... TAG STAG. etc. Monolingual parsers are extended for bitext parsing. Synchronous Grammar: SCFG ... – PowerPoint PPT presentation

Number of Views:207

Avg rating:3.0/5.0

Slides: 64

Provided by: scie5

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Syntax-based Statistical Machine Translation Models

1
Syntax-based Statistical Machine Translation
Models

Amr Ahmed
March 26th 2008

2
Outline

The Translation Problem
The Noisy Channel Model
Syntax-light SMT
Why Syntax?
Syntax-based SMT Models
Summary

3
Statistical Machine Translation
Problem

Given a sentence (f) in one language, produce it
is equivalent in another language (e)

I know how to do this
One naturally wonders if the problem of
translation could conceivably be treated as a
problem in cryptography. When I look at an
article in Arabic, I say This is really written
in English, but it has been coded in some strange
symbols. I will now proceed to decode. , Warren
Weaver, 1947
4
Statistical Machine Translation
Problem

Given a sentence (f) in one language, produce it
is equivalent in another language (e)

Noisy Channel Model
Noisy Channel
P(e)
We know how to factor P(e)!
e
f
P(e) models good English P(fe) models good
translation
Today How to factor p(fe)?
5
Outline

The Translation Problem
The Noisy Channel Model
Syntax-light SMT
Word-based Models
Phrase-based Models
Why Syntax?
Syntax-based SMT Models
Summary

6
Word-Translation Models
Auf
Frage
diese
bekommen
ich
habe
leider
Antwort
keine
Blue word links arent observed in data.
NULL
I
did
not
unfortunately
receive
an
answer
to
this
question

What is the generative Story?
IBM Model 1-4
Roughly equivalent to FST (module reordering)
Learning and Decoding?

Slide Credit Adapted from Smith et. al.
7
Word-Based Translation Models
e
-Stochastic operations -Associated with
probabilities -Estimated using EM
In a Nutshell
Q What are we learning? A Word movement
f
Linguistic Hypothesis
Phrase based Models
1- Words move in blocks 2- Context is important
8
Phrase-Based Translation Models
e
Segment
-Stochastic operations -Associated with
probabilities -Estimated using EM
Translation
In a Nutshell
Q What are we learning? A Word movement
Re-ordering
f
Linguistic Hypothesis
Phrase based Models
1- Words move in blocks 2- Context is important
Markovian Dependency
9
Phrase-Based Translation Models
e
Segment
-Stochastic operations -Associated with
probabilities -Estimated using EM
Translation
In a Nutshell
a1
a2
a3
Q What are we learning? A Word movement
Re-ordering
f
Linguistic Hypothesis
Phrase based Models
1- Words move in blocks 2- Context is important
Markovian Dependency
10
Phrase-Based Models Example
Not necessarily syntactic phrases
Division into phrases is hidden
Auf
Frage
diese
bekommen
ich
habe
leider
Antwort
keine
question
I
did
not
unfortunately
receive
an
answer
to
this
Score each phrase pair using several features
Slide Credit from Smith et. al.
11
Phrase Table Estimation
Basically count and Normalize
12
Outline

The Translation Problem
The Noisy Channel Model
Syntax-light SMT
Word-based Models
Phrase-based Models
Why Syntax?
Syntax-based SMT Models
Summary

13
Outline

The Translation Problem
The Noisy Channel Model
Syntax-light SMT
Why Syntax?
Syntax-based SMT Models
Summary

14
Why Syntax?

Reference consequently proposals are submitted
to parliament under the assent procedure, meaning
that parliament can no longer table amendments,
as directives in this area were adopted as single
market legislation under the codecision procedure
on the basis of art.100a tec.
Translation consequently, the proposals
parliament after the assent procedure, the tabled
amendments for offers no possibility of community
directives, because as part of the internal
market legislation on the basis of article 100a
of the treaty in the codecision procedure have
been adopted.

Slide Credit Example from Cowan et. al.
15
Why Syntax?
Slide Credit Adapted from Cowan et. al.

Reference consequently proposals are submitted
to parliament under the assent procedure, meaning
that parliament can no longer table amendments,
as directives in this area were adopted as single
market legislation under the codecision procedure
on the basis of art.100a tec.
Translation consequently, the proposals
parliament after the assent procedure, the tabled
amendments for offers no possibility of community
directives, because as part of the internal
market legislation on the basis of article 100a
of the treaty in the codecision procedure have
been adopted.

Here syntax Can help!
What Went Wrong?

phrase-based systems are very good at predicting
content words,
But are less accurate in producing function
words, or producing output that correctly encodes
grammatical relations between content words

16
Structure Does Help!
Does adding more Structure help ?
Se
Se
x1
x2
x3
Noisy Channel
Noisy Channel
Sf
Sf
x2
x1
x3
Word-based
Phrase-based
Syntax-based
Better performance
?
17
Syntax and the Translation Pipeline
Input
Pre-reordering
Translation system
Syntax
Syntax in the Translation model
Output
Post processing (re-ranking)
18
Early Exposition (Koehn et al 2003)

Fix a phrase-based System and vary the way
phrases are extracted
Frequency-based, Generative, Constituent
Adding syntax hurts the performance
Phrases like there is? es gibt is not a
constituent (this eliminate 80 phrase-pairs)
Explanation
No hierarchical re-ordering
Syntax is not fully exploited here!
Parse trees produce errors

19
Outline

The Translation Problem
The Noisy Channel Model
Syntax-light SMT
Why Syntax?
Syntax-based SMT Models
Summary

20
The Big Picture Translation Models
Inter-lingua
Inter-lingua
Syntax
Syntax
Syntax
Syntax
Syntax
Syntax
String
String
String
String
String
String
Word-based
Phrase-based
SCFG (Chiang 2005), ITG (Wu 97)
Inter-lingua
Inter-lingua
Inter-lingua
Syntax
Syntax
Syntax
Syntax
Syntax
Syntax
String
String
String
String
String
String
Tree-Tree Transducers
Tree-String Transducers
String-Tree Transducers
21
Learning Synchronous Grammar

No linguistic annotation
Model P(e,f) jointly
Trees are hidden variables
EM doesnt work well with large missing
information
Structural restrictions
Binary rules (ITG, Wu 97)
Lexical restriction Chiang 2005

SCFG to represent Hierarchal phrases
What is Synchronous Grammar?
22
Interlude Synchronous Grammar

Extension of monolingual theory to bitext
CFG ? SCFG
TAG ? STAG
etc.
Monolingual parsers are extended for bitext
parsing

23
Synchronous Grammar SCFG
CFG
SCFG
24
Learning Synchronous Grammar

No linguistic annotation
Model P(e,f) jointly
Trees are hidden variables
EM doesnt work well with large missing
information
Structural restrictions
Binary rules (ITG, Wu 97)
Lexical restriction Chiang 2005

SCFG to represent Hierarchal phrases
What is Synchronous Grammar?
How
25
Hierarchical Phrase-based Model
Hierarchical Phrased-based Models
S1
S1
x1
x1
S2
S2
x3
x3
f3
x2
f4
e3
x2
e4
f1
e1
e5
e6
e6
e5
Phrased-based Models
Se
Sf
x1
x2
x3
x2
x1
x3
26
Example (Chiang 2005)
27
Hierarchical Phrase-based Model
Question1
How to train the model?
What are the restrictions
-At most two recursive phrases -Restriction on
length
Question 2
How to decode?
28
Training and Decoding

Collect initial grammar rules

29
Training and Decoding

Collect initial grammar rules
Tune rule weights count and normalize!
Decoding
CYK (remember rules has at most two
non-terminals)
Parse the f part only.

30
Does it help?

Experimental Details
Mandarin-to-English (FBIS corpus)
7.2M 9.2 M words
Devset NIST 2002 MT evaluation
Test Set 2003 NIST MT evaluation
7.5 relative improvement over phrase-based
models using BLEU score
0.02 absolute improvement over baseline

31
Does it help?

7.5 relative improvement over phrase-based models

Learnt rules are formally SCFG but not
linguistically interpretable
The model learns re-ordering patterns guided by
lexical functional words
Capture long-range movements via recursion

32
Follow-Up study

Why not decorate the phrases with their
grammatical constituents?
Zollmann et. Al. 2006, 2007
If possible decorate the phrase with a
constituent
Generalize phrases as in Chiang 2005
Parse using chart parsing
Moved from 31.85 ?32.15 over CMU phrase-based
system
Spanich-English corpus

33
The Big Picture Translation Models
Inter-lingua
Inter-lingua
Syntax
Syntax
Syntax
Syntax
Syntax
Syntax
String
String
String
String
String
String
Word-based
Phrase-based
SCFG (Chiang 2005), ITG (Wu 97)
Inter-lingua
Inter-lingua
Inter-lingua
Syntax
Syntax
Syntax
Syntax
Syntax
Syntax
String
String
String
String
String
String
Tree-Tree Transducers
Tree-String Transducers
String-Tree Transducers
34
Tree-String Tranceducers

Linguistic Tools
English Parse Trees
from statistical parser
Alignment
from Giza
Conditional Model
P(f Te)
Models differ on
How to factor P(f Te)
Domain of locality
SCFG (Yamada,Knight 2001)
STSG (Galley et. Al 2004)

Caveat
35
Tree-String (Yamada Knight)

Back to noisy channel model
Traduces Te into f
Stochastic Channel operations (on trees)
Reorder children
Insert node
Lexical Transplantation

36
Channel operations
P(VB T0? T0 VB)
P(rightPRP) Pi(ha)
37
Learning

Learn channel operation probabilities
Reordering
Insertion
Translation
Standard EM-Training
E-Step compute expected rule counts (Dyn.)
M-Step count and normalize

38
Decoding As Parsing

In a nutshell, we learnt how to parse the foreign
side
Add CFG rules from the English side
Channel rules
Reordering
If (VB2-gtVB T0) reordered as (VB2? T0 VB)
Add rule VB2?p T0 VB
Insertion
V?plXV and V?prV X and X?fi
Translation
ei?pt fi

39
Decoding Example
40
Results and Expressiveness

English-Chinese task
Short sentences lt 20 words (3M word corpus)
Test set 347 sentence with at most 14 words
Better Bleu score (0.102) than IBM-4 (.072)

What it can represent

Depends on syntactic divergence between languages
pairs
Tree must be isomorphic up to child re-reordering
Channel rules have the following format

Q What it cant model?
Child re-ordering
41
Limitations

Cant model syntactic movements that cross
brackets
SVO to VSO
Modal movement between English and French
Not ? ne .. pas (from English to French)

VP
VP
VP
VP
VP
.
VB
Aux
go
Not
Does
pas
va
ne
The span of Not cant intersect that of Go
Cant Interleave Green with the other two
42
Limitations Possible solutions

Some follow up study showed relative improvement
by
Gildea 2003 added cloning operations
AER went from .42 ? 0.3 on Korean-English corus

VP
VP
VP
VP
VP
.
VB
Aux
go
Not
Does
pas
va
ne
The span of Not cant intersect that of Go
Cant Interleave Green with the other two
43
Tree-String Tranceducers

Linguistic Tools
English Parse Trees
from statistical parser
Alignment
from Giza
Conditional Model
P(f Te)
Models differ on
How to factor P(f Te)
Domain of locality
SCFG (Yamada,Knight 2001)
STSG (Galley et. Al 2004)

Caveat
44
Learning Expressive Rules (Galley 2004)
Yamada Knight
Channel Operation Tables
f1,f2,..,fn
Parsing Rules For F-side
Galley, et. al 2004
Rule Extraction
TSG rules
CFG rules

Condition on larger fragments of the trees

45
Rule format Decoding
Rule 1
Current State
Derivation Step
VP
VP
fi1
ne VB pas
fi
NP VP
NP ne VB pas
X2
Aux
go
VB
PRP
PRP
Not
Does
go
Not
he
he
Does
Tree Fragment
CFG

Tree is build bottom up
Foreign string at each derivation may have
non-terminals
Rules are extracted from training corpus
English side trees
Foreign side strings
Alignment from Giza

46
Rule Extraction
S
VP
NP
Aux
VB
RB
Upward projection
PRP
go
Not
Does
he
pas
il
va
ne
S
go
VP
he il
VP
NP
Frontier nodes Nodes whose span is exclusive
il
Aux
VBva
va
Rx
Frontier Graph
Not
Does
PRP il
VBva
NP il
S,NP,PRP,he, VP, VB,go
pas
ne
PRP il
he il
Gova
Extract Rule as before
47
Illustrating Rule Extraction
48
Minimality of Extracted rules

Other rules can be composed form these minimal
rules

VP
VP
Aux
Aux
VB
VBva
RB
Rx
VBva

go
Not
Does
Not
Does
Gova
pas
pas
va
ne
ne
49
Probability Estimation

Just EM
Modified inside outside for E-Step
Decoding as parsing
Training can be done using of the shelf
tree-transducers (Knight et al. 2004)

50
Evaluation

Coverage
how well the learnt rules explain the corpus
100 coverage on F-E and C-E corpus

Translation Results
Decoder was still work in progress
51
The Big Picture Translation Models
Inter-lingua
Inter-lingua
Syntax
Syntax
Syntax
Syntax
Syntax
Syntax
String
String
String
String
String
String
Word-based
Phrase-based
SCFG (Chiang 2005), ITG (Wu 97)
Inter-lingua
Inter-lingua
Inter-lingua
Syntax
Syntax
Syntax
Syntax
Syntax
Syntax
String
String
String
String
String
String
Tree-Tree Transducers
Tree-String Transducers
String-Tree Transducers
52
Tree-Tree Transducers