Title: DP-based Search Algorithms for Statistical Machine Translation
1DP-based Search Algorithms for Statistical
Machine Translation
- My name Mauricio Zuluaga
- Based on Christoph Tillmann Presentation and
Word Reordering and a Dynamic Programming Beam
Search Algorithm for Statistical Machine
Translation, C. Tillmann, H. Ney
2Computational Challenges in M.T.
- Source sentence f (French)
- Target sentence e (English)
- Bayes' rule
- Pr(ef) Pr(e)Pr(fe)/Pr(f)
3Computational Challenges in M.T.
- Estimating the language model probability Pr(e)
(L.M. Problem Trigram) - Estimating the Translation model probability
Pr(fe) (T. Problem) - Finding an efficient way to search for the
English sentence that maximizes the product
(Search Problem). We want to focus only in the
most likely hypothesis during the search.
4Approach based on Bayes rule
Source Language Text
Transformation
Global Search
over
Inverse Transformation
Target Language Text
5Model Details
- Trigram language model
- Translation model (simplified)
- Lexicon probabilities
- Fertilities
- Class-based distortion probs
- Here, j is the currently covered input sentence
position and j0 is the previously covered input
sentence position. The input sentence length J is
included, since we would like to think of the
distortion probability as normalized according to
J. Tillmann
6Model Details (Model 4 vs. Model 3)
- Same except in the handling of distortion
probabilities. - In model 4 there are 2 separate distortion
probabilities for the head of a tablet and the
rest of the words of the tablet. - Probability depends on the previous tablet and on
the identity (class) of the French word being
placed. (Ej, appearance of adjectives before
nouns in English but after them in French). - We expect dl(-lI.A(e),/3(f)) to be larger than
dl( llA(e),/3(f)) when e is an adjective and d
is a noun. Indeed, this is borne out in the
trained distortion probabilities for Model 4,
where we find that dl(-lA(government's),B(develop
pement)) is 0.7986, while dl( lA(government's),B
(developpement)) is 0.0168. - A and B are class functions of the English and
French words (in this implementation AB50
classes)
7Decoder
- Others have followed different approaches for
Decoders - This is the part where we have to be efficient
!!! - Word Reordering and a Dynamic Programming Beam
Search Algorithm for Statistical Machine
Translation, C. Tillmann, H. Ney - DP-based beam search decoder for IBM-model 4
(this is the one described in the previous paper)
8Example Alignment
Word-to-Word Alignment (source to target)
Hidden Alignment
.
May
of
Target
forth
the
on
you
visit
not
can
colleague
my
case
this
In
.
am
In
Sie
kann
nicht
diesem
mein
Fall
Mai
vierten
Kollege
Source
besuchen
9Inverted Alignments
Inverted alignment (target to source)
Coverage constraint introduce coverage vector
i
i - 1
Target Positions
Source Positions
10Traveling Salesman Problem
- Problem Visit J cities
- Costs for transitions between cities
- Visit each city exactly once, minimizing overall
costs - Dynamic Programming (Held-Karp 1962)
- Cities correspond to source sentence positions
(words,coverage constraint) - Costs (negative logarithm of the product of the
translation, alignment and language model
probabilities).
11Traveling Salesman Problem
- DP with auxiliary quantity
- Shortest path from
city 1 to city j - visiting all cities
in - Complexity using DP
- The order in which cities are visited is not
important - Only costs for the best path reaching j has to
be stored - Remember Minimum edit distance formulation was
also a DP search problem
12(1,2,3,3)
(1,2,2)
(1,2,4,4)
(1,2,3,4,5,2)
(1,2,5,5)
(1,2,3,2)
(1,2,3,4,5,3)
(1,3,3)
(1,3,4,4)
(1,3,5,5)
Final
(1,1)
(1,2,4,2)
(1,2,3,4,5,4)
(1,4,4)
(1,3,4,3)
(1,4,5,5)
(1,2,5,2)
(1,2,3,4,5,5)
(1,5,5)
(1,3,5,3)
(1,4,5,4)
13(1,2,3,3)
(1,2,2)
(1,2,4,4)
(1,2,3,4,5,2)
(1,2,5,5)
(1,2,3,2)
(1,2,3,4,5,3)
(1,3,3)
(1,3,4,4)
(1,3,5,5)
Final
(1,1)
(1,2,4,2)
(1,2,3,4,5,4)
(1,4,4)
(1,3,4,3)
(1,4,5,5)
(1,2,5,2)
(1,2,3,4,5,5)
(1,5,5)
(1,3,5,3)
(1,4,5,4)
14M.T. Recursion Equation
Maximum approximation
Q(e,C,j) is the probability of the best partial
hypothesis (e1..ei, b1..bi) where C bk k
1..i, bi j, ei e, and ei-1 e
Complexity where E is
the size of the Target language vocabulary
(still too large)
15DP-based Search Algorithm
16IBM-Style Re-ordering (S3)
- Procedural Restriction select one of the first 4
empty positions (to extend the hypothesis) - Upper bound for word reordering complexity
j
1
J
17Verb Group Re-ordering (GE)
Complexity Mostly monotonic traversal from left
to right
.
May
of
forth
the
on
you
visit
not
can
colleague
my
case
this
In
.
am
In
Sie
nicht
kann
diesem
mein
Fall
vierten
Kollege
Mai
besuchen
18Beam Search Pruning
- Search proceeds cardinality-synchronously over
coverage vectors - Three pruning types
- Coverage pruning
- Cardinality pruning
- Observation pruning(number of words produced by a
source word f is limited)
19Beam Search Pruning
- 4 kinds of Thresholds
- the coverage pruning threshold tC
- the coverage histogram threshold nC
- the cardinality pruning threshold tc (looks only
at the cardinality) - the cardinality histogram threshold nc (looks
only at the cardinality) - Define new probabilities based on uncovered
positions (using only trigrams and lexicon
probabilities). - Maintain only the ones above the thresholds.
20Beam Search Pruning
- Compute best score and apply threshold
- For each coverage vector
- For each cardinality
- Use histogram pruning
- Observation pruning for each select best
target word
21German-English Verbmobil
- German to English, IBM-4
- Evaluation Measure m-WER and SSER
- Training 58 K sentence pairs
- Vocabulary 8K (German), 5K (English)
- Test-331 (held-out data) (scaling factors for
language and distortion models) - Test-147 (evaluation)
22Effect of Coverage Pruning
23TEST-147 Translation Results
24References
- Word Reordering and a Dynamic Programming Beam
Search Algorithm for Statistical Machine
Translation, C. Tillmann, H. Ney - A DP based Search Using Monotone Alignments in
Statistical Translation C. Tillmann, S. Vogel,
H. Ney, A. Zubiaga - The Mathematics of Statistical Machine
Translation Parameter Estimation Peter E Brown,
Vincent J. Della Pietra, Stephen A. Della Pietra,
Robert L. Mercer - Accelerated DP Based Search for Statistical
Translation, C. Tillmann, S. Vogel, H. Ney, A.
Zubiaga, H. Sawaf - Word Re-orderign and DP-based Search in
Statistical Machine Translation, H. Ney, C.
Tillmann