Decoding and Reordering

About This Presentation

Title:

Decoding and Reordering

Description:

compose. merge. Binarizing Syntax Trees for Syntax-Based MT. Substructures of the ... Using the EM algorithm to choose restructuring ... – PowerPoint PPT presentation

Number of Views:22

Avg rating:3.0/5.0

Slides: 66

Provided by: mtgrou

Category:

more less

Transcript and Presenter's Notes

Title: Decoding and Reordering

1
Decoding and Reordering

Jiang Wenbin
2007-10-30

2
Outline

A Probabilistic Approach to Syntax-based
Reordering for Statistical Machine Translation
Binarizing Syntax Trees to Improve Syntax-Based
Machine Translation Accuracy
Forest Rescoring Faster Decoding with
Integrated Language Models
An Efficient Two-Pass Approach to Synchronous-CFG
Driven Statistical MT

3
Syntax-based Reordering for Phrase-based Decoding

There are global reordering and local reordering
in phrase-based decoding according to the
distortion limit (resort to the distance-based
reordering model (Koehn et al., 2003) for
details)

4
Syntax-based Reordering for Phrase-based Decoding

Syntax , a potential solution to global
reordering, gives decoder a reordered input
If it works, how about n-best list of reordered
inputs?

5
Translation Models

one reordered input
nbest reordered input

S
6
Translation Models

one reordered input
nbest reordered input

S1
S2
Generate nbest reordered inputs
S
Sn
7
Translation Models

one reordered input
nbest reordered input

S1
T1
Phrase-based decoding
S2
T2
S
Sn
Tn
8
Translation Models

one reordered input
nbest reordered input

S1
T1
S2
T2
T
Select the best T
S
Sn
Tn
9
Select the best translation

Definition of the best T

P(CE) P(EC) P(LM) Just like those in the
phrase-based SMT
The probability of reordering S as S
10
Select the best translation

Transformation

11
Acquisition of Reordering Knowledge

Given a node N on the parse tree of an source
language sentence, reordering knowledge can be
extract from the relative order of its children
phrase pi and corresponding target language
phrase T(pi)
Just consider the case of binary node for
simplicity

12
Acquisition of Reordering Knowledge
13
Two kinds of representations

Reordering rules
Z is the phrase label of a binary node
X and Y are the phrase labels of Zs children
Pr(IN-ORDER) and Pr(INVERTED) are the
probabilities that X and Y are inverted or not in
the target language.
Estimate the probability by Maximum Likelihood
Estimation

14
Two kinds of representations

Maximum Entropy Model
(binary classification problem)
Features mfay be used
Leftmost word
Rightmost word
Head word
Context word
POSs
All features above can be extracted from source
phrases as well as target phrases

15
The Application of Reordering Knowledge

Let Pr(p?p) be the probability for a particular
reordering, it denote the probability of
reordering a phrase p into p

unitary node
binary node
16
The Application of Reordering Knowledge

The number of S increases exponentially. Let
R(N) be the number of reorderings of the phrase
yielded by N
Traverse the source language tree bottom to up,
at each node we just keep n ps that have the
highest reordering probability

17
Remedy data sparseness

If T(p1) and T(p2) overlap, the node N with
children N1 and N2 is not taken as a training
instance
The amount of training input is greatly reduced
Remove some less probable alignment points to
minimize overlapping phrases

18
Decoding

As the greedy reordering algorithm above has a
tendency to focus on a particular clause of a
long sentence, for a lot of long sentences
containing several clauses, only one of the
clauses is reordered.

19
Decoding
S
20
Decoding
C1
S
Ci
Cn
Clauses spliting
21
Decoding
S1
Cj1
Cj2
C1
Sj
S
Ci
Cjn
Cn
Sm
Clauses spliting
Clauses reordering
22
Decoding
S1
Cj21
Cj1
Cj22
Cj2
C1
Sj
S
Ci
Cj2n
Cjn
Cn
Sm
Clauses spliting
Clauses reordering
In-Clause reordering
23
Decoding
S1
Cj21
T(Cj21)
Cj1
Cj22
T(Cj22)
Cj2
C1
Sj
S
Ci
Cj2n
T(Cj2n)
Cjn
Cn
Sm
translation
Clauses spliting
Clauses reordering
In-Clause reordering
compose
24
Decoding
S1
Cj21
T(Cj21)
Cj1
Cj22
T(Cj22)
T(Cj2)
Cj2
C1
Sj
S
Ci
Cj2n
T(Cj2n)
Cjn
Cn
Select the best
Sm
translation
Clauses spliting
Clauses reordering
In-Clause reordering
25
Decoding
S1
Cj21
T(Cj21)
Cj1
Cj22
T(Cj22)
T(Cj2)
Cj2
C1
Sj
T(Sj)
S
Ci
Cj2n
T(Cj2n)
Cjn
Cn
Sm
translation
merge
Select the best
Clauses spliting
Clauses reordering
In-Clause reordering
26
Decoding
S1
Cj21
T(Cj21)
Cj1
Cj22
T(Cj22)
T(Cj2)
Cj2
C1
Sj
T(Sj)
T(S)
S
Ci
Cj2n
T(Cj2n)
Cjn
Cn
Sm
translation
Clauses spliting
Clauses reordering
In-Clause reordering
merge
compose
Select the best
27
Binarizing Syntax Trees for Syntax-Based MT

Substructures of the tree cannot be reused
A solution is to binarizingsyntax trees
Simple methods such asleft-, right-,
head-binarization and theircombinations

28
Left/Right binarizaton
29
Left/Right binarizaton
30
Definition Left Binarization

The left binarization of node n factorizes the
leftmost r-1 children by forming a new node n to
dominate them, leaving the last child nr
untouched, and then makes the new node n the left
child of n

31
Definition Left Binarization

The left binarization of node n factorizes the
leftmost r-1 children by forming a new node n to
dominate them, leaving the last child nr
untouched, and then makes the new node n the left
child of n

32
Definition Right Binarization

The right binarization of node n factorizes the
rightmost r-1 children by forming a new node n to
dominate them, leaving the first child n1
untouched, and then makes the new node n the
right child of n

33
Definition Right Binarization

The right binarization of node n factorizes the
rightmost r-1 children by forming a new node n to
dominate them, leaving the first child n1
untouched, and then makes the new node n the
right child of n

34
Definition Head Binarization

Left binarizes n if the head is the first child,
otherwise right binarizes it. We prefer
right-binarization if both applicable.
Keep the head be in the push-down part.

35
Parallel binatization

Transform a parse tree into a packed binarization
forest
Packed forest is composed of additive forest
nodesand multiplicative forest nodes

36
Procedure

Given a tree node that has children n1,nr
Recursively parallel-binarize childrennode
n1,nr, producing binarization
Right-binarize n if any contiguous subset of
children n2,nr is factorizable, insert a label
n, recursively parrel-binarize n to generate a
binarization forest node , then form a
multiplicative forest nodeas the parent of
and
Left-binarize is similar to Right-binarize above
except that the subset is n1,nr-1. Finaly it
forms a multiplicative forest node
Form an additive node as the parent
of and

37
Example
Similar to OR
Similar to AND
38
Extract translation rule Condition1
39
Extract translation rule
Call Procedure-1
40
Extract translation rule
Call Procedure-1
Call Procedure-2 recursively
41
Extract translation rule
Call Procedure-1
Call Procedure-2 recursively
Call Procedure-2 recursively
42
Extract translation rule Condition2
43
Extract translation rule
Call Procedure-2
44
Extract translation rule
Call Procedure-1 recursively
Call Procedure-2
45
Extract translation rule
Call Procedure-1 recursively
Call Procedure-1 recursively
Call Procedure-2
46
Extract translation rule

So we can build a derivation forest, through
traversing the forest top-down recursively we can
extract rules at admissible forest nodes

47
Learning how to binarize via EM algorithm

Perform a set of binarization operations ß on a
parse tree t
Each binarition ßis the sequence of binarizations
on the necessary nodes in t in pre-order
Each binarization ßresults in a restructured tree
t ß
Extract rules from (t ß , f , a ), generating a
translation model consisting parameters ?
(i.e.,rule probability)
Obtain the ßthat satisfy

48
Using the EM algorithm to choose restructuring
49
Forest Rescoring Faster Decoding with Integrated
Language Models

Efficient decoding for phrase-based MT models and
syntax-based MT model is a difficult problem
If the language model is fully integrated into
the decoder, there will be an expensive overhead
for maintaining target-language boundary words in
decoding

50
Some alternative methods

Rescoring Produce a k-best list of candidate
translations without LM, then rerank the the
k-best list using the LM
Forest rescoring
Cube pruning
Cube growing

51
Cube pruning some details

Avoid duplicate deductions

52
Cube pruning some details

Avoid duplicate deductions

The first to be extracted
53
Cube pruning some details

Avoid duplicate deductions

The second to be extracted
54
Cube pruning some details

Avoid duplicate deductions

The third to be extracted
55
Cube pruning some details

When we compute the LM

Here?
Cube1
Here?
Stack
Cube2
Heap
Cuben
56
Cube pruning some details

Suppose that we are decoding with hierachical
phrase based model, the dimension of the cube is
at most 3, because ench rule has at most 2 vars,
The rule itself forms a dimension, while the two
vars forms the other two

Dimension 1X1 ? X2
Dimension 3????
Dimension 2??
57
Cube pruning some details

So when we extract the best derivation from the
top of the heap, we put at most 3 neighbors of it
into the heap as candidates

Nb1 i1, j ,k Nb1 i, j1 ,k Nb1 i, j
,k1
Dimension 1X1 ? X2
Dimension 3????
Dimension 2??
k
i
j
58
Cube growing
LazyJthBest(n)
59
Cube growing
LazyJthBest(n)
Fire(1,1, cand)
Fire(1,1, cand)
60
Cube growing
LazyJthBest(n)
Fire(1,1, cand)
Fire(1,1, cand)
LazyJthBest(1)
61
Cube growing
LazyJthBest(n)
Fire(1,1, cand)
Fire(1,1, cand)
LazyJthBest(1)
62
Cube growing
LazyJthBest(n)
Fire(1,1, cand)
Fire(1,1, cand)
LazyJthBest(1)
63
Cube growing
LazyJthBest(n)
While D(v)ltn AND cand not empty
Pop_Min Fire(Nb1,cand) Fire(Nb2,cand) Fi
re(Nbm,cand) End
64
Two-Pass Approach to SCFG SMT

The first pass, corresponding to a severe
parameterization of Cube Pruning, consider only
the first best (LM integrated) chart item in each
cell, while maintaining unexplored alternatives
for second-pass consideration
The second stage, we drive the search process
with the integration of long distance and
flexible history n-gram LMs, rather than simply
using such models for hypothesis rescoring

Decoding and Reordering - PowerPoint PPT Presentation

Decoding and Reordering

compose. merge. Binarizing Syntax Trees for Syntax-Based MT. Substructures of the ... Using the EM algorithm to choose restructuring ... – PowerPoint PPT presentation