Title: Extraction of Bilingual Information from Parallel Texts
1Extraction of Bilingual Information from Parallel
Texts
2Outline
- Machine Translation
- Traditional vs. Statistical Architectures
- Experimental Results
- Conclusions
3Translational Equivalencemanymany relation
SOURCE
TARGET
4Traditional Machine Translation
5Remarks
- Character of System
- Knowledge based.
- High quality results if domain is well delimited.
- Knowledge takes the form of specialised rules
(analysis synthesis transfer). - Problems
- Limited coverage
- Knowledge acquisition bottleneck.
- Extensibility.
6Statistical Translation
- Robust
- Domain independent
- Extensible
- Does not require language specialists
- Uses noisy channel model of translation
7Noisy Channel ModelSentence Translation (Brown
et. al. 1990)
target sentence
sourcesentence
sentence
8The Problem of Translation
- Given a sentence T of the target language, seek
the sentence S from which a translator produced
T, i.e. - find S that maximises P(ST)
- By Bayes' theorem
- P(ST) P(S) x P(TS)
- P(T)
- whose denominator is independent of S.
- Hence it suffices to maximise P(S) x P(TS)
9A Statistical MT System
S
T
Source Language Model
Translation Model
P(S) P(TS)
P(S,T)
T
S
Decoder
10The Three Components of a Statistical MT model
- Method for computing language model probabilities
(P(S)) - Method for computing translation probabilities
(P(ST)) - Method for searching amongst source sentences for
one that maximisesP(S) P(TS)
11ProbabilisticLanguage Models
- GeneralP(s1s2...sn) P(s1)P(s2s1)
...P(sns1...s(n-1)) - TrigramP(s1s2...sn) P(s1)P(s2s1)P(s3s1,s2)
...P(sns(n-1)s(n-2)) - BigramP(s1s2...sn) P(s1)P(s2s1)
...P(sns(n-1))
12A Simple Alignment Based Translation Model
- Assumption target sentence is generated from
the source sentence word-by-word S John
loves Mary T Jean aime Marie
13Sentence Translation Probability
- According to this model, the translation
probability of the sentence is just the product
of the translation probabilities of the words. - P(TS) P(Jean aime MarieJohn loves Mary)
P(JeanJohn) P(aimeloves) P(MarieMary)
14More Realistic Example
The proposal will not now
be implemented
Les propositions ne seront pas mises en
application maintenant
15Some Further Parameters
- Word Translation ProbabilityP(ts)
- Fertility the number of words in the target that
are paired with each source word (0 N) - Distortion the difference in sentence position
between the source word and the target word
P(ij,l)
16Searching
- Maintain list of hypotheses. Initial hypothesis
(Jean aime Marie ) - Search proceeds interatively. At each iteration
we extend most promising hypotheses with
additional wordsJean aime Marie John(1) Jean
aime Marie loves(2) Jean aime Marie
Mary(3)
17Parameter Estimation
- In general - large quantities of data
- For language model, we need only source language
text. - For translation model, we need pairs of sentences
that are translations of each other. - Use EM Algorithm (Baum 1972) to optimize model
parameters.
18Experiment (Brown et. al. 1990)
- Hansard. 40,000 pairs of sentences approx.
800,000 words in each language. - Considered 9,000 most common words in each
language. - Assumptions (initial parameter values)
- each of the 9000 target words equally likely as
translations of each of the source words. - each of the fertilities from 0 to 25 equally
likely for each of the 9000 source words - each target position equally likely given each
source position and target length
19English not
- French Probability
- pas .469
- ne .460
- non .024
- pas du tout .003
- faux .003
- plus .002
- ce .002
- que .002
- jamais .002
- Fertility Probability
- 2 .758
- 0 .133
- 1 .106
20English hear
- French Probability
- bravo .992
- entendre .005
- entendu .002
- entends .001
- Fertility Probability
- 0 .584
- 1 .416
21Bajada 2003/4
- 400 sentence pairs from Malta/EU accession treaty
- Three different types of alignment
- Paragraph (precision 97 recall 97)
- Sentence (precision 91 recall 95)
- Word 2 translation models
- Model 1 distortion independent
- Model 2 distortion dependent
22Bajada 2003/4
23Conclusion/Future Work
- Larger data sets
- Finer models of word/word translation
probabilities taking into account - fertility
- morphological variants of the same words
- Role and tools for bilingual informant (not
linguistic specialist)