Machine Translation

About This Presentation

Title:

Machine Translation

Description:

English - Spanish. English - Greek. Training Data cont. Eliminated. Misaligned sentences ... A Chinese to English Translation. The End. Are there any questions ... – PowerPoint PPT presentation

Number of Views:93

Avg rating:3.0/5.0

Slides: 32

Provided by: michae266

Learn more at: https://people.cs.umass.edu

Category:

Tags: english | machine | spanish | to | translation

more less

Transcript and Presenter's Notes

Title: Machine Translation

1
Machine Translation

A Presentation by
Julie Conlonova,
Rob Chase,
and Eric Pomerleau

2
Overview

Language Alignment System
Datasets
Sentence-aligned sets for training (ex. The
Hansards Corpus, European Parliamentary
Proceedings Parallel Corpus)
A word-aligned set for testing and evaluation to
measure accuracy and precision
Decoding

3
Language Alignment

Goal Produce a word-aligned set from a
sentence-aligned dataset
First step on the road toward Statistical Machine
Translation
Example Problem
The motion to adjourn the House is now deemed to
have been adopted.
La motion portant que la Chambre s'ajourne
maintenant est réputée adoptée.

4
IBM Models 1 and 2-Kevin Knight, A Statistical
MT Tutorial Workbook, 1999

Each capable of being used to produce a
word-aligned dataset separately.
EM Algorithm
Model 1 produces T-values based on normalized
fractional counting of corresponding words.
Additionally, Model 2 uses A-values for reverse
distortion probabilities probabilities based
on the positions of the words

5
Training Data

European Parliament Proceedings Parallel Corpus
1996-2003
Aligned Languages
English - French
English - Dutch
English - Italian
English - Finish
English - Portuguese
English - Spanish
English - Greek

6
Training Data cont.

Eliminated
Misaligned sentences
Sentences with 50 or more words
XML tags
Symbols and numerical characters other then
commas and periods

7
Ideally
http//www.cs.berkeley.edu/klein/cs294-5
8
Bypassing Interlingua Models I-III

Variables contributing to the probability of a
sentence
Correlation between words in the source/target
languages
Fertility of a word
Correlation between order of words in source
sentence and order of words in target

9
A Translation Matrix
Rob Cat is Dog
Rob 1 0 0 0
Gato 0 1 0 0
es 0 0 .5 0
esta 0 0 .5 0
Perro 0 0 0 1
10
Building the Translation Matrix Starting from
alignments

Find the sentence alignment
If a word in the source aligns with a word in the
target, then increment the translation matrix.
Normalize the translation matrix

11
Cant find alignments

Most sentences in the hansards corpus are 60
words long. There are many that can be over 100.
100100 possible alignments

12
Counting

Rob is a boy. Rob es nino.
Rob is tall. Rob es alto.
Eric is tall. Eric es alto.
Base counts on co-occurrence, weighting based on
sentence length.

13
Iterative Convergence

Use Estimation Maximization algorithm
Creates translation matrix

Rob Is Tall boy
Rob .66 .33 .25 .25
es .30 .66 .25 .25
alto .2 .05 .5 0
nino .2 .05 0 .5
14
Distorting the Sentence

Word order changes between languages
How is a sentence with 2 words distorted?
How is a sentence with 3 words distorted?
How is a sentence with
To keep track of this information we use

15
A tesseract!

(A quadruply nested default dictionary)
This could be a problem if there are more than
100 words in a sentence.
100x100x100x100 too big for RAM and takes too
much time

16
Broad Look at MT

The translation process can be described simply
as
Decoding the meaning of the source text, and
Re-encoding this meaning in the target language.
- Translation Process, Wikipedia, May 2006

17
Decoding

How to go from the T-matrix and A-matrix to a
word alignment?
There are several approaches

18
Viterbi

If only doing alignment, much smaller memory and
time requirements.
Returns optimal path.
T-Matrix probabilities function as the emission
matrix
A-Matrix probabilities concerned with the
positioning of words

19
Decoding as a Translator

Without supplying a translated sentence to the
program, it is capable of being a stand-alone
translator instead of a word aligner.
However, while the Viterbi algorithm runs quickly
with pruning for decoding, for translating the
run time skyrockets.

20
Greedy Hill ClimbingKnight Koehn, Whats New
in Statistical Machine Translation, 2003

Best first search
2-step look ahead to avoid getting stuck in most
probable local maxima

21
Beam SearchKnight Koehn, Whats New in
Statistical Machine Translation, 2003

Optimization of Best First Search with heuristics
and beam of choices
Exponential tradeoff when increasing the beam
width

22
Other Decoding MethodsKnight Koehn, Whats New
in Statistical Machine Translation, 2003

Finite State Transducer
Mapping between languages based on a finite
automaton
Parsing
String to Tree Model

23
Problem One to Many

Necessary to take all alignments over a certain
probability in order to capture the probability
that e has fertility at least a given value

Al-Onaizan, Curin, Jahr, etc., Statistical
Machine Translation, 1999
24
Results

Study done in 2003 on word alignment error rates
in Hansards corpus
Model 2
29.3 on 8K training sentence pairs
19.5 on 1.47M training sentence pairs
Optimized Model 6
20.3 on 8K training sentence pairs
8.7 on 1.47M training sentence pairs
Och and Ney, A Systematic Comparison of Various
Statistical Alignment Models, 2003

25
Expected Accuracy

70 overall
Language performance
Dutch
French
Italian, Spanish, Portuguese
Greek
Finish

26
Possible Future Work

Given more time, we wouldve implemented IBM
Model 3
Additionally uses n, p, and d fertilities for
weighted alignments
N, number of words produced by one word
D, distortion
P, parameter involving words that arent involved
directly
Invokes Model 2 for scoring

27
Another Possible Translation Scheme

Example-Based Machine Translation
Translation-by-Analogy
Can sometimes achieve better than the gist
translations from other models

28
Why Is Improving Machine Translation Necessary?
29
A Chinese to English Translation
30
The End

Machine Translation - PowerPoint PPT Presentation

Machine Translation

English - Spanish. English - Greek. Training Data cont. Eliminated. Misaligned sentences ... A Chinese to English Translation. The End. Are there any questions ... – PowerPoint PPT presentation