DP-based Search Algorithms for Statistical Machine Translation - PowerPoint PPT Presentation

About This Presentation

Title:

DP-based Search Algorithms for Statistical Machine Translation

Description:

DP-based Search Algorithms for Statistical Machine Translation. My name: Mauricio Zuluaga ... of Statistical Machine Translation: Parameter Estimation Peter ... – PowerPoint PPT presentation

Number of Views:71

Avg rating:3.0/5.0

Slides: 25

Provided by: ladati

Category:

more less

Transcript and Presenter's Notes

Title: DP-based Search Algorithms for Statistical Machine Translation

1
DP-based Search Algorithms for Statistical
Machine Translation

My name Mauricio Zuluaga
Based on Christoph Tillmann Presentation and
Word Reordering and a Dynamic Programming Beam
Search Algorithm for Statistical Machine
Translation, C. Tillmann, H. Ney

2
Computational Challenges in M.T.

Source sentence f (French)
Target sentence e (English)
Bayes' rule
Pr(ef) Pr(e)Pr(fe)/Pr(f)

3
Computational Challenges in M.T.

Estimating the language model probability Pr(e)
(L.M. Problem Trigram)
Estimating the Translation model probability
Pr(fe) (T. Problem)
Finding an efficient way to search for the
English sentence that maximizes the product
(Search Problem). We want to focus only in the
most likely hypothesis during the search.

4
Approach based on Bayes rule
Source Language Text
Transformation
Global Search
over
Inverse Transformation
Target Language Text
5
Model Details

Trigram language model
Translation model (simplified)
Lexicon probabilities
Fertilities
Class-based distortion probs
Here, j is the currently covered input sentence
position and j0 is the previously covered input
sentence position. The input sentence length J is
included, since we would like to think of the
distortion probability as normalized according to
J. Tillmann

6
Model Details (Model 4 vs. Model 3)

Same except in the handling of distortion
probabilities.
In model 4 there are 2 separate distortion
probabilities for the head of a tablet and the
rest of the words of the tablet.
Probability depends on the previous tablet and on
the identity (class) of the French word being
placed. (Ej, appearance of adjectives before
nouns in English but after them in French).
We expect dl(-lI.A(e),/3(f)) to be larger than
dl( llA(e),/3(f)) when e is an adjective and d
is a noun. Indeed, this is borne out in the
trained distortion probabilities for Model 4,
where we find that dl(-lA(government's),B(develop
pement)) is 0.7986, while dl( lA(government's),B
(developpement)) is 0.0168.
A and B are class functions of the English and
French words (in this implementation AB50
classes)

7
Decoder

Others have followed different approaches for
Decoders
This is the part where we have to be efficient
!!!
Word Reordering and a Dynamic Programming Beam
Search Algorithm for Statistical Machine
Translation, C. Tillmann, H. Ney
DP-based beam search decoder for IBM-model 4
(this is the one described in the previous paper)

8
Example Alignment
Word-to-Word Alignment (source to target)
Hidden Alignment
.
May
of
Target
forth
the
on
you
visit
not
can
colleague
my
case
this
In
.
am
In
Sie
kann
nicht
diesem
mein
Fall
Mai
vierten
Kollege
Source
besuchen
9
Inverted Alignments
Inverted alignment (target to source)
Coverage constraint introduce coverage vector
i
i - 1
Target Positions
Source Positions
10
Traveling Salesman Problem

Problem Visit J cities
Costs for transitions between cities
Visit each city exactly once, minimizing overall
costs
Dynamic Programming (Held-Karp 1962)
Cities correspond to source sentence positions
(words,coverage constraint)
Costs (negative logarithm of the product of the
translation, alignment and language model
probabilities).

11
Traveling Salesman Problem

DP with auxiliary quantity
Shortest path from
city 1 to city j
visiting all cities
in
Complexity using DP

The order in which cities are visited is not
important
Only costs for the best path reaching j has to
be stored
Remember Minimum edit distance formulation was
also a DP search problem

12
(1,2,3,3)
(1,2,2)
(1,2,4,4)
(1,2,3,4,5,2)
(1,2,5,5)
(1,2,3,2)
(1,2,3,4,5,3)
(1,3,3)
(1,3,4,4)
(1,3,5,5)
Final
(1,1)
(1,2,4,2)
(1,2,3,4,5,4)
(1,4,4)
(1,3,4,3)
(1,4,5,5)
(1,2,5,2)
(1,2,3,4,5,5)
(1,5,5)
(1,3,5,3)
(1,4,5,4)
13
(1,2,3,3)
(1,2,2)
(1,2,4,4)
(1,2,3,4,5,2)
(1,2,5,5)
(1,2,3,2)
(1,2,3,4,5,3)
(1,3,3)
(1,3,4,4)
(1,3,5,5)
Final
(1,1)
(1,2,4,2)
(1,2,3,4,5,4)
(1,4,4)
(1,3,4,3)
(1,4,5,5)
(1,2,5,2)
(1,2,3,4,5,5)
(1,5,5)
(1,3,5,3)
(1,4,5,4)
14
M.T. Recursion Equation
Maximum approximation
Q(e,C,j) is the probability of the best partial
hypothesis (e1..ei, b1..bi) where C bk k
1..i, bi j, ei e, and ei-1 e
Complexity where E is
the size of the Target language vocabulary
(still too large)
15
DP-based Search Algorithm
16
IBM-Style Re-ordering (S3)

Procedural Restriction select one of the first 4
empty positions (to extend the hypothesis)
Upper bound for word reordering complexity

j
1
J
17
Verb Group Re-ordering (GE)
Complexity Mostly monotonic traversal from left
to right
.
May
of
forth
the
on
you
visit
not
can
colleague
my
case
this
In
.
am
In
Sie
nicht
kann
diesem
mein
Fall
vierten
Kollege
Mai
besuchen
18
Beam Search Pruning

Search proceeds cardinality-synchronously over
coverage vectors
Three pruning types
Coverage pruning
Cardinality pruning
Observation pruning(number of words produced by a
source word f is limited)

19
Beam Search Pruning

4 kinds of Thresholds
the coverage pruning threshold tC
the coverage histogram threshold nC
the cardinality pruning threshold tc (looks only
at the cardinality)
the cardinality histogram threshold nc (looks
only at the cardinality)
Define new probabilities based on uncovered
positions (using only trigrams and lexicon
probabilities).
Maintain only the ones above the thresholds.

20
Beam Search Pruning

Compute best score and apply threshold
For each coverage vector
For each cardinality
Use histogram pruning
Observation pruning for each select best
target word

21
German-English Verbmobil

German to English, IBM-4
Evaluation Measure m-WER and SSER
Training 58 K sentence pairs
Vocabulary 8K (German), 5K (English)
Test-331 (held-out data) (scaling factors for
language and distortion models)
Test-147 (evaluation)

22
Effect of Coverage Pruning
23
TEST-147 Translation Results
24
References

Word Reordering and a Dynamic Programming Beam
Search Algorithm for Statistical Machine
Translation, C. Tillmann, H. Ney
A DP based Search Using Monotone Alignments in
Statistical Translation C. Tillmann, S. Vogel,
H. Ney, A. Zubiaga
The Mathematics of Statistical Machine
Translation Parameter Estimation Peter E Brown,
Vincent J. Della Pietra, Stephen A. Della Pietra,
Robert L. Mercer
Accelerated DP Based Search for Statistical
Translation, C. Tillmann, S. Vogel, H. Ney, A.
Zubiaga, H. Sawaf
Word Re-orderign and DP-based Search in
Statistical Machine Translation, H. Ney, C.
Tillmann