Title: Grammatical Machine Translation
1Grammatical Machine Translation
- Stefan Riezler John Maxwell
2Overview
- Introduction
- Extracting F-Structure Snippets
- Parsing-Transfer-Generation
- Statistical Models and Training
- Experimental Evaluation
- Discussion
3Section 1Introduction
4Introduction
- Recent approaches to SMT use
- Phrase-based SMT
- Syntactic knowledge
- Phrase-base SMT is great for
- Local ordering
- Short idiomatic expressions
- But not so good for
- Learning LDDs
- Generalising to unseen phrases that share
non-overt linguistic info
5Statistical Parsers
- Statistical Parsers can provide information to
- Resolve LDDs
- Generalise to unseen phrases that share non-overt
linguistic info - Examples
- Xia McCord 2004
- Collins et al. 2005
- Lin 2004
- Ding Palmer 2005
- Quirk et al. 2005
6Grammar-based Generation
- Could grammar-based generation be useful for MT?
- Quirk et al. 2005
- Simple statistical model outperforms grammar-base
generator of Menezes Richardson 2001 on BLEU
score - Charniak et al. 2003
- Parsing-based language modelling can improve
grammaticality of translations while not
improving BLEU score - Perhaps BLEU score is not sufficient way to test
for grammaticality. - Further investigation needed
7Grammatical Machine Translation
- Aim
- Investigate incorporating a grammar-based
generator into a dependency-based SMT system - The authors present
- A dependency-based SMT model
- Statistical components that are modelled on
phrase-based system of Koehn et al. 2003 - Also used
- Component weights adjusted using MER training
(Och 2003) - Grammar-based generator
- N-gram and distortion models
8Section 2Extracting F-Structure Snippets
9Extracting F-Structure Snippets
- SL and TL sentences of bilingual corpus parsed
using - LFG grammars
- For each English and German f-structure pair
- The two f-structures that most preserve
dependencies are selected - Many-to-many word alignments used to create
many-to-many correspondences between the
substructures - Correspondences are the basis for deciding what
goes into the basic transfer rule
10Extracting F-Structure SnippetsExample
- Dafur bin ich zutiefst dankbar ? I have a
deep appreciation for that - ltfor thatgt ltamgt ltIgt ltdeepestgt ltthankfulgt
- Many-to-many bidirectional word alignment
11Transfer Rule Extraction Example
- From the aligned words we get the following
substructure correspondences
12Transfer Rule Extraction Example
- From the correspondences two kinds of transfer
rules are extracted - Primitive Transfer Rules
- Complex Transfer Rules
- Transfer Contiguity Constraint
- Source and target f-structures are each
connected. - F-structures in the transfer source can only be
aligned with f-structures in the transfer target
and vice versa.
13Transfer Rule Extraction Example
- Primitive Rule 1
- pred( X1, sein) pred( X1, have)
- subj( X1, X2) ? subj( X1, X2)
- xcomp( X1, X3) obj( X1, X3)
14Transfer Rule Extraction Example
- Primitive Rule 2
- pred( X1, ich) ? pred( X1, I)
15Transfer Rule Extraction Example
- Primitive Rule 3
- pred( X1, dafur) pred( X1, for)
- ? obj( X1, X2)
- pred( X2, that)
16Transfer Rule Extraction Example
- Primitive Rule 4
- pred( X1, dankbar) pred( X1, appreciation)
- adj( X1, X2) ? spec( X1, X2)
- in_set( X3, X2) pred( X2, a)
- pred(X3, zutiefst) adj( X1, X3)
- in_set( X4, X3)
- pred( X4, deep)
17Transfer Rule Extraction Example
- Complex Transfer Rules
- primitive transfer rules that are adjacent in
f-structure combined to form more complex rules - Example (rules 1 2 above)
pred( X1, sein) pred( X1, have) subj( X1,
X2) ? subj( X1, X2) pred( X2, ich)
pred( X2, I) xcomp( X1, X3) obj( X1, X3)
In the worst case, there can be an exponential
number of combinations of primitive transfer
rules, the number of primitive rules used to form
a complex rule is restricted to 3 causing the
no. of transfer rules taken to be O(n2) in the
worst case.
18Section 3Parsing-Transfer-Generation
19Parsing
- LFG grammars used to parse source and target text
- FRAGMENT grammar is used to augment standard
grammar increasing robustness - Correct parse determined by fewest chunk method
20Transfer
- Rules applied to source f-structure
non-deterministically and in parallel - Each fact of German f-structure translated by
exactly one transfer rule - Default rule included that allows any fact to be
translated as itself - Chart used to encode translations
- Beam search decoding used to select the most
probable translations
21Generation
- Method of generation has to be fault tolerant
- Transfer system can be given a fragmentary parse
as input - Transfer system can output an non-valid
f-structure - Unknown predicates
- Default morphology used to inflect source stem
for English - Unknown structures
- Default grammar used that allows any attribute to
be generated in any order with any category
22Section 4Statistical Models Training
23Statistical Components
- Modelled on statistical components of Pharaoh
- Paraoh integrates 8 statistical models
- Relative frequency of phrase translations in
source-to-target - Relative frequency of phrase translations in
target-to-source - Lexical weighting in source-to-target
- Lexical weighting in target-to-source
- Phrase count
- Language model probability
- Word count
- Distortion probability
24Statistical Components
- Following statistics for each translation
- Log-probability of source-to-target transfer
rules, where the probability r(ef) of a rule
that transfers source snippet f into target
snippet e is estimated by the relative frequency
2. Log-probability of target-to-source rules
25Statistical Components
- 3. Log-probability of lexical translations from
- source to target snippets, estimated from
Viterbi alignments â between source word
positions i 1, , n and target word positions j
1, , m for stems fi and ej in snippets f and e
with relative word translation frequencies
t(ejfi)
4. Log-probability of lexical translations from
target-to-source snippets
26Statistical Components
- 5. Number of transfer rule
- 6. Number of transfer rules with frequency 1
- 7. Number of default transfer rules
- 8. Log-probability of strings of predicates from
root to frontier of target f-structure, estimated
from predicate trigrams of English - 9. Number of predicates in target language
- 10. Number of constituent movements during
generation based on the original order of the
head predicates of the constituents (for example,
AP2 BP3 CP1 counts as two movements since
the head predicate of CP moved from first to
third position)
27Statistical Components
- 11. Number of generation repairs
- 12. Log-probability of target string as computed
by trigram language model - 13. Number of words in target string
- 1 10 are used to choose the most probable parse
from the transfer chart - 1 7 are are tests on source and target
f-structure snippets related via transfer rules - 8 -10 are language model and distortion features
on the target c- and f-structures - 11 13 are computed on the strings that are
generated from the target f-structure - The statistics are combined into a log-linear
model whose parameters are adjusted by minimum
error rate training.
28Section 5ExperimentalEvaluation
29Experimental Evaluation
- Europarl German to English
- Sents of length 5 15 words
- Training set 163,141 sents
- Development set 1,967 sents
- Test set 1,755 sents (same as Koehn et al
2003) - Bidirectional word alignment created from word
alignment of IBM model 4 as implemented by Giza
(Och et al. 1999) - Grammars achieve 100 coverage on unseen data
- 80 as full parses
- 20 as fragment parses
- 700,000 transfer rules extracted
- For language modelling trigram model of Stolcke
2002 is used
30Experimental Evaluation
- For translating the test set
- 1 parse for each German sentence was used
- 10 transferred f-structures
- 1,000 generated strings for each transferred
f-structure - Most probable target f-structure is gotten by a
beam search on the transfer chart using features
1-10 above, with a beam size of 20. - Features 11-13 are computed on the strings that
are generated
31Experimental Evaluation
- For automatic evaluation they used NIST combined
with the approximate randomization test (Noreen,
1999)
32Experimental Evaluation
- Manual Evaluation
- To separate the factors of grammaticality and
translation adequacy - 500 sentences randomly extracted from in-coverage
examples - 2 independent human judges
- Presented with the output from the phrase-based
SMT system and LFG-based system in a blind test
and asked them to choose a preference for one of
the translations based on - Grammaticality / fluency
- Translational / semantic adequacy
33Experimental Evaluation
- Promising results for examples that are
in-coverage of LFG grammars ? - However, back-off to robustness techniques for
parsing and generation results in loss of
translation quality ? - Rule Extraction Problems
- 20 of the parses are fragmental
- Errors occur in rule extraction process resulting
in ill-formed transfer rules - Parsing-Transfer-Generation Problems
- Parsing errors ? errors in transfer ? generation
errors - In-coverage ? disambiguation errors in parsing
and transfer ? suboptimal translation
34Experimental Evaluation
- Despite use of minimum error rate training and
n-gram language models, the system cannot be used
to maximize n-gram scores on reference
translations in the same way as phrase-based
systems since statistical ordering models are
employed in the framework after generation - This gives preference to grammaticality over
similarity to reference translations
35Conclusion
- SMT model that marries phrase-based SMT with
traditional grammar-based MT - NIST measure showed that results achieved are
comparable with phrase-based SMT system of Koehn
et al 2003 for in-coverage examples - Manual evaluation showed significant improvements
in both grammaticality and translational adequacy
for in-coverage examples
36Conclusion
- Determinable with this system whether or not a
source sentence is in-coverage - Possibility for hybrid system that achieves
improved grammaticality at state-of-the-art
translation quality - Future Work
- Improvement of translation of in-coverage source
sentences e.g. stochastic generation - Apply system to other language pairs and data sets
37References
- Miriam Butt, Dyvik Helge, Tracy King, Hiroshi
Masuichi and Christian Rohrer. 2002 The Parallel
Grammar Project. - Eugene Charniak, Kevin Knight and Kenji Yamada.
2003 Syntax-based Language Models for Statistical
Machine Translation. - Michael Collins, PhilippKoehn and Ivona
Kucerova. 2005 Clause Restructuring for
Statistical Machine Translation. - Philipp Koehn, Franz Och and Daniel Marcu. 2003
Statistical Phrase-based Translation. - Philipp Koehn. 2004 Pharaoh a beam search
decoder for phrase-based statistical machine
translation - Arul Menezes and Stephen Richardson. 2001 A
best-first alignment for automatic extraction of
transfer mappings from bilingual corpora. - Franz Och, Christoph Tillmann and Ney Hermann.
1999 Improved Alignment Models for Statistical
Machine Translation. - Franz Och. 2003 Minimum error rate training in
statistical machine translation. - Kishore Papineni, Salim Roukos, Todd Ward and
Wei-Jing Zhu. 2002 BLEU a method for automatic
evaluation of machine translation. - Stefan Riezler, Tracy King, Ronald Kaplan,
Richard Crouch, John Maxwell and Mark Johnson.
2002 Parsing the Wall Street Journal using LFG
and Discriminative Estimation Techniques - Stefan Riezler and John Maxwell. 2006
Grammatical Machine Translation. - Fei Xia and Michael McCord. 2004 Improving a
statistical MT system with automatically learned
rewrite patterns -