Advanced Techniques in NLP - PowerPoint PPT Presentation

About This Presentation

Title:

Advanced Techniques in NLP

Description:

Do we want a translation system for one language pair or for many language pairs? ... the best method in statistical machine translation. Discriminative training ... – PowerPoint PPT presentation

Number of Views:199

Avg rating:3.0/5.0

Slides: 57

Provided by: MikeR2

Category:

more less

Transcript and Presenter's Notes

Title: Advanced Techniques in NLP

1
Advanced Techniques in NLP

Machine Translation III
Statistical MT

2
Approaching MT

There are many different ways of approaching the
problem of MT.
The choice of approach is complex and depends
upon
Task requirements
Human resources
Linguistic resources

3
Criterial Issues

Do we want a translation system for one language
pair or for many language pairs?
Can we assume a constrained vocabulary or do we
need to deal with arbitrary text?
What resources exist for the languages that we
are dealing with?
How long will it take us to develop the resources
and what human resources?

4
Parallel Data

Lots of translated text available 100s of
million words of translated text for some
language pairs
a book has a few 100,000s words
an educated person may read 10,000 words a day
3.5 million words a year
300 million a lifetime
Computers can see more translated text than
humans read in a lifetime
Machine can learn how to translate foreign
languages.
Koehn 2006

5
Statistical Translation

Robust
Domain independent
Extensible
Does not require language specialists
Does requires parallel texts
Uses noisy channel model of translation

6
Noisy Channel ModelSentence Translation (Brown
et. al. 1990)
target sentence
sourcesentence
sentence
7
Statistical Modelling

Learn P(fe) from a parallel corpus
Not sufficient data to estimate P(fe) directly
from Koehn 2006

8
The Problem of Translation

Given a sentence T of the target language, seek
the source sentence S from which a translator
produced T, i.e.
find S that maximises P(ST)
By Bayes' theorem
P(ST) P(S) x P(TS)
P(T)
whose denominator is independent of S.
Hence it suffices to maximise P(S) x P(TS)

9
The Three Components of a Statistical MT model

Method for computing language model probabilities
(P(S))
Method for computing translation probabilities
(P(ST))
Method for searching amongst source sentences for
one that maximisesP(S) P(TS)

10
A Statistical MT System
S
T
Source Language Model
Translation Model
P(S) P(TS)
P(ST)
T
S
Decoder
11
Three Kinds of Model
12
Language Models based on N-Grams of Words

GeneralP(s1s2...sn) P(s1)P(s2s1)
...P(sns1...s(n-1))
TrigramP(s1s2...sn) P(s1)P(s2s1)P(s3s1,s2)
...P(sns(n-1)s(n-2))
BigramP(s1s2...sn) P(s1)P(s2s1)
...P(sns(n-1))

13
Syntax Based Language Models

Good syntax tree good English
Allows for long distance contstraints
Left sentence preferred by syntax based model

14
Word-Based Translation Models

Translation process is decomposed into smaller
steps
Each is tied to words
Based on IBM Models Brown et al., 1993
from Koehn 2006

15
Word TM derived from Bitext

ENGLISH
the cat sleeps
the dog sleeps
the horse eats

FRENCH
le chat dort
le chien dort
le cheval mange

16
le chat dort/the cat sleeps
le I I I
chat I I I
chien
cheval
dort I I I
mange
the cat dog horse sleeps eats
17
le chien dort/the dog sleeps
le II I I II
chat I I I
chien I I I
cheval
dort II I I II
mange
the cat dog horse sleeps eats
18
le cheval mange/the horse eats
P(ts)
le III I I I II I
chat I I I
chien I I I
cheval I I I
dort II I I II
mange I I I
the cat dog horse sleeps eats
3/9
1/9
1/9
1/9
2/9
1/9
19
Parameter Estimation

Based on counting occurrences within monolingual
and bilingual data.
For language model, we need only source language
text.
For translation model, we need pairs of sentences
that are translations of each other.
Use EM (Expectation Maximisation) Algorithm (Baum
1972) to optimize model parameters.

20
EM Algorithm

Word Alignmentsfor sentence pair ("a b c", "x y
z")are formed from arbitrary pairings from the
two sentences and include(a.x,b.y,c.z),
(a.z,b.y,c.x), etc.
There is a large number of possible alignments,
since we also allow, e.g.(ab.x,0.y,c.z),

21
EM Algorithm

Make initial estimate of parameters. This can be
used to compute the probability of any possible
word alignment.
Re-estimate parameters by ranking each possible
alignment by its probability according to initial
guess.
Repeated iterations assign ever greater
probability to the set of sentences actually
observed.
Algorithm leads to a local maximum of the
probability of observed sentence pairs as a
function of the model parameters

22
Parameters forIBM Translation Model

Word Translation Probability, P(ts)probability
that source word s is translated as target word
t.
Fertility P(ns)probability that source word s
is translated by n target words (25 n0).
Distortion P(ij,l)probability that source word
at position j is translated by target word at
position i in target sentence of length l.

23
Experiment 1 (Brown et. al. 1990)

Hansard. 40,000 pairs of sentences approx.
800,000 words in each language.
Considered 9,000 most common words in each
language.
Assumptions (initial parameter values)
each of the 9000 target words equally likely as
translations of each of the source words.
each of the fertilities from 0 to 25 equally
likely for each of the 9000 source words
each target position equally likely given each
source position and target length

24
English the

French Probability
le .610
la .178
l .083
les .023
ce .013
il .012
de .009
à .007
que .007

Fertility Probability
1 .871
0 .124
2 .004

25
English not

French Probability
pas .469
ne .460
non .024
pas du tout .003
faux .003
plus .002
ce .002
que .002
jamais .002

Fertility Probability
2 .758
0 .133
1 .106

26
English hear

French Probability
bravo .992
entendre .005
entendu .002
entends .001

Fertility Probability
0 .584
1 .416

27
Sentence Translation Probability

Given translation model for words, we can compute
translation probability of sentence taking
parameters into account.
P(Jean aime MarieJohn loves Mary)
P(JeanJohn) P(1,John) P(11,3)
P(aimeloves) P(1,loves) P(22,3)
P(MarieMary) P(1,Mary) P(33,3)

28
Flaws in Word-Based Translation

Model handles manyone P(ttts) but not onemany
P(tsss) translations
e.g.
Zeitmangel erschwert das
Problem .
lack of time makes more difficult the problem
.
Correct translation Lack of time makes the
problem more difficult.
MT output Time makes the problem.
from Koehn 2006

29
Flaws Word-Based Translation (2)

Phrasal Translation P(tttssss)
e.g. erübrigt sich /there is no point in
Eine Diskussion erübrigt
sich demnach .
a discussion is made unnecessary itself
therefore .
Correct translation Therefore, there is no point
in a discussion.
MT output A debate turned therefore .
from Koehn 2006

30
Flaws in Word BasedTranslation (3)

Syntactic transformations
Example Object/subject reordering
Den Vorschlag lehnt die Kommission abthe
proposal rejects the commission off
Correct translation The commission rejects the
proposal .
MT output The proposal rejects the commission.
from Koehn 2006

31
Phrase Based Translation Models

Foreign input is segmented in phrases.
Phrases are any sequence of words, not
necessarily linguistically motivated.
Each phrase is translated into English
Phrases are reordered.
from Koehn 2006

32
Syntax Based Translation Models
33
Word Based Decoding searching for the best
translation (Brown 1990)

Maintain list of hypotheses.
Initial hypothesis (Jean aime Marie )
Search proceeds iteratively.
At each iteration we extend most promising
hypotheses with additional wordsJean aime Marie
John(1) Jean aime Marie loves(2) Jean
aime Marie Mary(3) Jean aime Marie
Jean(1)
Parenthesised numbers indicate corresponding
position in target sentence

34
Phrase-Based Decoding

Build translation left to right
select foreign word(s) to be translated
find English phrase translation
add English phrase to end of partial translation
Koehn 2006

35
Decoding Process

one to many translation
Koehn 2006

36
Decoding Process

many to one translation
Koehn 2006

37
Decoding Process

translation finished
Koehn 2006

38
Hypothesis Expansion

Start with empty hypothesis
e no English words
f no foreign words covered
p probability 1
Koehn 2006

39
Hypothesis Expansion
40
Hypothesis Expansion

further hypothesis expansion
Koehn 2006

41
Decoding Process

adding more hypotheses leads to explosion of
search space.
Koehn 2006

42
Hypothesis Recombination

Sometimes different choices of hypothesis lead to
the same translation result.
Such paths can be combined.
Koehn 2006

43
Hypothesis Recombination

Drop weaker path
Keep pointer from weaker path
Koehn 2006

44
Pruning

Hypothesis recombination is not sufficient
Heuristically discard weak hypotheses early
Organize Hypothesis in stacks, e.g. by
same foreign words covered
same number of foreign words covered (Pharaoh
does this)
same number of English words produced
Compare hypotheses in stacks, discard bad ones
histogram pruning keep top n hypotheses in each
stack (e.g., n100)
threshold pruning keep hypotheses that are at
most times the cost of best hypothesis in stack
(e.g., 0.001)

45
Hypothesis Stacks

Organization of hypothesis into stacks
here based on number of foreign words translated
during translation all hypotheses from one stack
are expanded
expanded Hypotheses are placed into stacks
one to many translation
Koehn 2006

46
Comparing Hypotheses covering Same Number of
Foreign Words

Hypothesis that covers easy part of sentence is
preferred
Need to consider future cost of uncovered parts
Should take account of one to many translation
Koehn 2006

47
Future Cost Estimation

Use future cost estimates when pruning hypotheses
For each uncovered contiguous span
look up future costs for each maximal contiguous
uncovered span
add to actually accumulated cost for translation
option for pruning
Koehn 2006

48
Pharoah

A beam search decoder for phrase-based models
works with various phrase-based models
beam search algorithm
time complexity roughly linear with input length
good quality takes about 1 second per sentence
Very good performance in DARPA/NIST Evaluation
Freely available for researchers
http//www.isi.edu/licensed-sw/pharaoh/
Coming soon open source version of Pharaoh

49
Pharoah Demo

echo das ist ein kleines haus pharaoh -f
pharaoh.ini gt out
Pharaoh v1.2.9, written by Philipp Koehn
a beam search decoder for phrase-based
statistical machine translation models
(c) 2002-2003 University of Southern California
(c) 2004 Massachusetts Institute of Technology
(c) 2005 University of Edinburgh, Scotland
loading language model from europarl.srilm
loading phrase translation table from
phrase-table, stored 21, pruned 0, kept 21
loaded data structures in 2 seconds
reading input sentences
translating 1 sentences.translated 1 sentences
in 0 seconds
3mm cat out
this is a small house

50
Brown Experiment 2

Perform translation using 1000 most frequent
words in the English corpus.
1,700 most frequently used French words in
translations of sentences completely covered by
1000 word English vocabulary.
117,000 pairs of sentences completely covered by
both vocabularies.
Parameters of English language model from 570,000
sentences in English part.

51
Experiment 2 contd

73 French sentences tested from elsewhere in
corpus. Results were classified as
Exact same as actual translation
Alternate same meaning
Different legitimate translation but different
meaning
Wrong could not be intepreted as a translation
Ungrammatical grammatically deficient
Corrections to the last three categories were
made and keystrokes were counted

52
Results
Category sentences percent
Exact 4 5
Alternate 18 25
Different 13 18
Wrong 11 15
Ungrammatical 27 37

Total 73
53
Results - Discussion

According to Brown et. al., system performed
successfully 48 of the time (first three
categories).
776 keystrokes needed to repair 1916 keystrokes
to generate all 73 translations from scratch.
According to authors, system therefore reduces
work by 60.

54
Issues

Automatic evaluation methods
can computers decide what are good translations?
Phrase-based models
what are atomic units of translation?
how are they discovered?
the best method in statistical machine
translation
Discriminative training
what are the methods that directly optimize
translation performance?

55
The Speculative (Koehn 2006)

Syntax-based transfer models
how can we build models that take advantage of
syntax?
how can we ensure that the output is grammatical?
Factored translation models
how can we integrate different levels of
abstraction?

56
Bibliography

Statistical MTBrown et. al., A Statistical
Approach to MT, Computational Linguistics 16.2,
1990 pp79-85 (search ACL Anthology)
Koehn tutorial (see http//www.iccs.inf.ed.ac.uk/
pkoehn/)

Write a Comment

User Comments (0)