DP-based Search Algorithms for Statistical Machine Translation - PowerPoint PPT Presentation

About This Presentation
Title:

DP-based Search Algorithms for Statistical Machine Translation

Description:

DP-based Search Algorithms for Statistical Machine Translation. My name: Mauricio Zuluaga ... of Statistical Machine Translation: Parameter Estimation Peter ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 25
Provided by: ladati
Category:

less

Transcript and Presenter's Notes

Title: DP-based Search Algorithms for Statistical Machine Translation


1
DP-based Search Algorithms for Statistical
Machine Translation
  • My name Mauricio Zuluaga
  • Based on Christoph Tillmann Presentation and
    Word Reordering and a Dynamic Programming Beam
    Search Algorithm for Statistical Machine
    Translation, C. Tillmann, H. Ney

2
Computational Challenges in M.T.
  • Source sentence f (French)
  • Target sentence e (English)
  • Bayes' rule
  • Pr(ef) Pr(e)Pr(fe)/Pr(f)

3
Computational Challenges in M.T.
  • Estimating the language model probability Pr(e)
    (L.M. Problem Trigram)
  • Estimating the Translation model probability
    Pr(fe) (T. Problem)
  • Finding an efficient way to search for the
    English sentence that maximizes the product
    (Search Problem). We want to focus only in the
    most likely hypothesis during the search.

4
Approach based on Bayes rule
Source Language Text
Transformation
Global Search
over
Inverse Transformation
Target Language Text
5
Model Details
  • Trigram language model
  • Translation model (simplified)
  • Lexicon probabilities
  • Fertilities
  • Class-based distortion probs
  • Here, j is the currently covered input sentence
    position and j0 is the previously covered input
    sentence position. The input sentence length J is
    included, since we would like to think of the
    distortion probability as normalized according to
    J. Tillmann

6
Model Details (Model 4 vs. Model 3)
  • Same except in the handling of distortion
    probabilities.
  • In model 4 there are 2 separate distortion
    probabilities for the head of a tablet and the
    rest of the words of the tablet.
  • Probability depends on the previous tablet and on
    the identity (class) of the French word being
    placed. (Ej, appearance of adjectives before
    nouns in English but after them in French).
  • We expect dl(-lI.A(e),/3(f)) to be larger than
    dl( llA(e),/3(f)) when e is an adjective and d
    is a noun. Indeed, this is borne out in the
    trained distortion probabilities for Model 4,
    where we find that dl(-lA(government's),B(develop
    pement)) is 0.7986, while dl( lA(government's),B
    (developpement)) is 0.0168.
  • A and B are class functions of the English and
    French words (in this implementation AB50
    classes)

7
Decoder
  • Others have followed different approaches for
    Decoders
  • This is the part where we have to be efficient
    !!!
  • Word Reordering and a Dynamic Programming Beam
    Search Algorithm for Statistical Machine
    Translation, C. Tillmann, H. Ney
  • DP-based beam search decoder for IBM-model 4
    (this is the one described in the previous paper)

8
Example Alignment
Word-to-Word Alignment (source to target)
Hidden Alignment
.
May
of
Target
forth
the
on
you
visit
not
can
colleague
my
case
this
In
.
am
In
Sie
kann
nicht
diesem
mein
Fall
Mai
vierten
Kollege
Source
besuchen
9
Inverted Alignments
Inverted alignment (target to source)
Coverage constraint introduce coverage vector
i
i - 1
Target Positions
Source Positions
10
Traveling Salesman Problem
  • Problem Visit J cities
  • Costs for transitions between cities
  • Visit each city exactly once, minimizing overall
    costs
  • Dynamic Programming (Held-Karp 1962)
  • Cities correspond to source sentence positions
    (words,coverage constraint)
  • Costs (negative logarithm of the product of the
    translation, alignment and language model
    probabilities).

11
Traveling Salesman Problem
  • DP with auxiliary quantity
  • Shortest path from
    city 1 to city j
  • visiting all cities
    in
  • Complexity using DP
  • The order in which cities are visited is not
    important
  • Only costs for the best path reaching j has to
    be stored
  • Remember Minimum edit distance formulation was
    also a DP search problem

12
(1,2,3,3)
(1,2,2)
(1,2,4,4)
(1,2,3,4,5,2)
(1,2,5,5)
(1,2,3,2)
(1,2,3,4,5,3)
(1,3,3)
(1,3,4,4)
(1,3,5,5)
Final
(1,1)
(1,2,4,2)
(1,2,3,4,5,4)
(1,4,4)
(1,3,4,3)
(1,4,5,5)
(1,2,5,2)
(1,2,3,4,5,5)
(1,5,5)
(1,3,5,3)
(1,4,5,4)
13
(1,2,3,3)
(1,2,2)
(1,2,4,4)
(1,2,3,4,5,2)
(1,2,5,5)
(1,2,3,2)
(1,2,3,4,5,3)
(1,3,3)
(1,3,4,4)
(1,3,5,5)
Final
(1,1)
(1,2,4,2)
(1,2,3,4,5,4)
(1,4,4)
(1,3,4,3)
(1,4,5,5)
(1,2,5,2)
(1,2,3,4,5,5)
(1,5,5)
(1,3,5,3)
(1,4,5,4)
14
M.T. Recursion Equation
Maximum approximation
Q(e,C,j) is the probability of the best partial
hypothesis (e1..ei, b1..bi) where C bk k
1..i, bi j, ei e, and ei-1 e
Complexity where E is
the size of the Target language vocabulary
(still too large)
15
DP-based Search Algorithm
16
IBM-Style Re-ordering (S3)
  • Procedural Restriction select one of the first 4
    empty positions (to extend the hypothesis)
  • Upper bound for word reordering complexity

j
1
J
17
Verb Group Re-ordering (GE)
Complexity Mostly monotonic traversal from left
to right
.
May
of
forth
the
on
you
visit
not
can
colleague
my
case
this
In
.
am
In
Sie
nicht
kann
diesem
mein
Fall
vierten
Kollege
Mai
besuchen
18
Beam Search Pruning
  • Search proceeds cardinality-synchronously over
    coverage vectors
  • Three pruning types
  • Coverage pruning
  • Cardinality pruning
  • Observation pruning(number of words produced by a
    source word f is limited)

19
Beam Search Pruning
  • 4 kinds of Thresholds
  • the coverage pruning threshold tC
  • the coverage histogram threshold nC
  • the cardinality pruning threshold tc (looks only
    at the cardinality)
  • the cardinality histogram threshold nc (looks
    only at the cardinality)
  • Define new probabilities based on uncovered
    positions (using only trigrams and lexicon
    probabilities).
  • Maintain only the ones above the thresholds.

20
Beam Search Pruning
  • Compute best score and apply threshold
  • For each coverage vector
  • For each cardinality
  • Use histogram pruning
  • Observation pruning for each select best
    target word

21
German-English Verbmobil
  • German to English, IBM-4
  • Evaluation Measure m-WER and SSER
  • Training 58 K sentence pairs
  • Vocabulary 8K (German), 5K (English)
  • Test-331 (held-out data) (scaling factors for
    language and distortion models)
  • Test-147 (evaluation)

22
Effect of Coverage Pruning
23
TEST-147 Translation Results
24
References
  • Word Reordering and a Dynamic Programming Beam
    Search Algorithm for Statistical Machine
    Translation, C. Tillmann, H. Ney
  • A DP based Search Using Monotone Alignments in
    Statistical Translation C. Tillmann, S. Vogel,
    H. Ney, A. Zubiaga
  • The Mathematics of Statistical Machine
    Translation Parameter Estimation Peter E Brown,
    Vincent J. Della Pietra, Stephen A. Della Pietra,
    Robert L. Mercer
  • Accelerated DP Based Search for Statistical
    Translation, C. Tillmann, S. Vogel, H. Ney, A.
    Zubiaga, H. Sawaf
  • Word Re-orderign and DP-based Search in
    Statistical Machine Translation, H. Ney, C.
    Tillmann
Write a Comment
User Comments (0)
About PowerShow.com