CS60057 Speech

About This Presentation

Title:

CS60057 Speech

Description:

dai yu zi zai chuang shang gan nian bao chai you ting jian ... ideas of using AI techniques in MT (KBMT, CMU) 1990's. Commercial MT systems. Statistical MT ... – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 118

Provided by: IBMU306

Category:

more less

Transcript and Presenter's Notes

Title: CS60057 Speech

1
CS60057Speech Natural Language Processing

Autumn 2008

Lecture 16 3 Sep 2008
2
Outline for MT

Intro and a little history
Language Similarities and Divergences
Three classic MT Approaches
Transfer
Interlingua
Direct
Modern Statistical MT
Evaluation

3
What is MT?

Translating a text from one language to another
automatically.

4
Machine Translation

dai yu zi zai chuang shang gan nian bao chai you
ting jian chuang wai zhu shao xiang ye zhe shang,
yu sheng xi li, qing han tou mu, bu jue you di
xia lei lai.
Dai-yu alone on bed top think-of-with-gratitude
Bao-chai again listen to window outside bamboo
tip plantain leaf of on-top rain sound sigh drop
clear cold penetrate curtain not feeling again
fall down tears come
As she lay there alone, Dai-yus thoughts turned
to Bao-chai Then she listened to the insistent
rustle of the rain on the bamboos and plantains
outside her window. The coldness penetrated the
curtains of her bed. Almost without noticing it
she had begun to cry.

5
Machine Translation
6
Machine Translation

The Story of the Stone
The Dream of the Red Chamber (Cao Xueqin 1792)
Issues
Word segmentation
Sentence segmentation 4 English sentences to 1
Chinese
Grammatical differences
Chinese rarely marks tense
As, turned to, had begun,
tou -gt penetrated
Zero anaphora
No articles
Stylistic and cultural differences
Bamboo tip plaintain leaf -gt bamboos and
plantains
Ma curtain -gt curtains of her bed
Rain sound sigh drop -gt insistent rustle of the
rain

7
Not just literature

Hansards Canadian parliamentary proceeedings

8
What is MT not good for?

Really hard stuff
Literature
Natural spoken speech (meetings, court reporting)
Really important stuff
Medical translation in hospitals, 911

9
What is MT good for?

Tasks for which a rough translation is fine
Web pages, email
Tasks for which MT can be post-edited
MT as first pass
Computer-aided human translation
Tasks in sublanguage domains where high-quality
MT is possible
FAHQT

10
Sublanguage domain

Weather forecasting
Cloudy with a chance of showers today and
Thursday
Low tonight 4
Can be modeling completely enough to use raw MT
output
Word classes and semantic features like MONTH,
PLACE, DIRECTION, TIME POINT

11
MT History

1946 Booth and Weaver discuss MT at Rockefeller
foundation in New York
1947-48 idea of dictionary-based direct
translation
1949 Weaver memorandum popularized idea
1952 all 18 MT researchers in world meet at MIT
1954 IBM/Georgetown Demo Russian-English MT
1955-65 lots of labs take up MT

12
History of MT Pessimism

1959/1960 Bar-Hillel Report on the state of MT
in US and GB
Argued FAHQT too hard (semantic ambiguity, etc)
Should work on semi-automatic instead of
automatic
His argumentLittle John was looking for his toy
box. Finally, he found it. The box was in the
pen. John was very happy.
Only human knowledge lets us know that
playpens are bigger than boxes, but writing
pens are smaller
His claim we would have to encode all of human
knowledge

13
History of MT Pessimism

The ALPAC report
Headed by John R. Pierce of Bell Labs
Conclusions
Supply of human translators exceeds demand
All the Soviet literature is already being
translated
MT has been a failure all current MT work had to
be post-edited
Sponsored evaluations which showed that
intelligibility and informativeness was worse
than human translations
Results
MT research suffered
Funding loss
Number of research labs declined
Association for Machine Translation and
Computational Linguistics dropped MT from its
name

14
History of MT

1976 Meteo, weather forecasts from English to
French
Systran (Babelfish) been used for 40 years
1970s
European focus in MT mainly ignored in US
1980s
ideas of using AI techniques in MT (KBMT, CMU)
1990s
Commercial MT systems
Statistical MT
Speech-to-speech translation

15
Language Similarities and Divergences

Some aspects of human language are universal or
near-universal, others diverge greatly.
Typology the study of systematic
cross-linguistic similarities and differences
What are the dimensions along with human
languages vary?

16
Morphological Variation

Isolating languages
Cantonese, Vietnamese each word generally has
one morpheme
Vs. Polysynthetic languages
Siberian Yupik (Eskimo) single word may have
very many morphemes
Agglutinative languages
Turkish morphemes have clean boundaries
Vs. Fusion languages
Russian single affix may have many morphemes

17
Syntactic Variation

SVO (Subject-Verb-Object) languages
English, German, French, Mandarin
SOV Languages
Japanese, Hindi
VSO languages
Irish, Classical Arabic
SVO lgs generally prepositions to Yuriko
VSO lgs generally postpositions Yuriko ni

18
Segmentation Variation

Not every writing system has word boundaries
marked
Chinese, Japanese, Thai, Vietnamese
Some languages tend to have sentences that are
quite long, closer to English paragraphs than
sentences
Modern Standard Arabic, Chinese

19
Inferential Load cold vs. hot lgs

Some cold languages require the hearer to do
more figuring out of who the various actors in
the various events are
Japanese, Chinese,
Other hot languages are pretty explicit about
saying who did what to whom.
English

20
Inferential Load (2)
All noun phrases in blue do not appear in Chinese
text But they are needed for a good translation
21
Lexical Divergences

Word to phrases
English computer science French
informatique
POS divergences
Eng. she likes/VERB to sing
Ger. Sie singt gerne/ADV
Eng Im hungry/ADJ
Sp. tengo hambre/NOUN

22
Lexical Divergences Specificity

Grammatical constraints
English has gender on pronouns, Mandarin not.
So translating 3rd person from Chinese to
English, need to figure out gender of the person!
Similarly from English they to French
ils/elles
Semantic constraints
English brother
Mandarin gege (older) versus didi (younger)
English wall
German Wand (inside) Mauer (outside)
German Berg
English hill or mountain

23
Lexical Divergence many-to-many
24
Lexical Divergence lexical gaps

Japanese no word for privacy
English no word for Cantonese haauseun or
Japanese oyakoko (something like filial
piety)
English cow versus beef, Cantonese ngau

25
Event-to-argument divergences

English
The bottle floated out.
Spanish
La botella salió flotando.
The bottle exited floating
Verb-framed lg mark direction of motion on verb
Spanish, French, Arabic, Hebrew, Japanese, Tamil,
Polynesian, Mayan, Bantu familiies
Satellite-framed lg mark direction of motion on
satellite
Crawl out, float off, jump down, walk over to,
run after
Rest of Indo-European, Hungarian, Finnish, Chinese

26
Structural divergences

G Wir treffen uns am Mittwoch
E Well meet on Wednesday

27
Head Swapping

E X swim across Y
S X crucar Y nadando
E I like to eat
G Ich esse gern
E Id prefer vanilla
G Mir wäre Vanille lieber

28
Thematic divergence

Y me gusto
I like Y
G Mir fällt der Termin ein
E I forget the date

29
Divergence counts from Bonnie Dorr

32 of sentences in UN Spanish/English Corpus (5K)

Categorial X tener hambre Y have hunger 98
Conflational X dar puñaladas a Z X stab Z 83
Structural X entrar en Y X enter Y 35
Head Swapping X cruzar Y nadando X swim across Y 8
Thematic X gustar a Y Y likes X 6
30
MT on the web

Babelfish
http//babelfish.altavista.com/
Google
http//www.google.com/search?hlenlrclientsafa
rirlsenq"1tazadejugo"28zumo29denaranja
5cucharadasdeazucarmorenabtnGSearch

31
3 methods for MT

Direct
Transfer
Interlingua

32
Three MT Approaches Direct, Transfer,
Interlingual
33
Direct Translation

Proceed word-by-word through text
Translating each word
No intermediate structures except morphology
Knowledge is in the form of
Huge bilingual dictionary
word-to-word translation information
After word translation, can do simple reordering
Adjective ordering English -gt French/Spanish

34
Direct MT Dictionary entry
35
Direct MT
36
Problems with direct MT

German
Chinese

37
The Transfer Model

Idea apply contrastive knowledge, i.e.,
knowledge about the difference between two
languages
Steps
Analysis Syntactically parse Source language
Transfer Rules to turn this parse into parse for
Target language
Generation Generate Target sentence from parse
tree

38
English to French

Generally
English Adjective Noun
French Noun Adjective
Note not always true
Route mauvaise bad road, badly-paved road
Mauvaise route wrong road)
But is a reasonable first approximation
Rule

39
Transfer rules
40
Lexical transfer

Transfer-based systems also need lexical transfer
rules
Bilingual dictionary (like for direct MT)
English home
German
nach Hause (going home)
Heim (home game)
Heimat (homeland, home country)
zu Hause (at home)
Can list at home lt-gt zu Hause
Or do Word Sense Disambiguation

41
Systran combining direct and transfer

Analysis
Morphological analysis, POS tagging
Chunking of NPs, PPs, phrases
Shallow dependency parsing
Transfer
Translation of idioms
Word sense disambiguation
Assigning prepositions based on governing verbs
Synthesis
Apply rich bilingual dictionary
Deal with reordering
Morphological generation

42
Transfer some problems

N2 sets of transfer rules!
Grammar and lexicon full of language-specific
stuff
Hard to build, hard to maintain

43
Interlingua

Intuition Instead of lg-lg knowledge rules, use
the meaning of the sentence to help
Steps
1) translate source sentence into meaning
representation
2) generate target sentence from meaning.

44
Interlingua for Mary did not slap the green witch
45
Interlingua

Idea is that some of the MT work that we need to
do is part of other NLP tasks
E.g., disambiguating Ebook Slibro from Ebook
Sreservar
So we could have concepts like BOOKVOLUME and
RESERVE and solve this problem once for each
language

46
Direct MT pros and cons (Bonnie Dorr)

Pros
Fast
Simple
Cheap
No translation rules hidden in lexicon
Cons
Unreliable
Not powerful
Rule proliferation
Requires lots of context
Major restructuring after lexical substitution

47
Interlingual MT pros and cons (B. Dorr)

Pros
Avoids the N2 problem
Easier to write rules
Cons
Semantics is HARD
Useful information lost (paraphrase)

48
The impossibility of translation

Hebrew adonoi roi for a culture without sheep
or shepherds
Something fluent and understandable, but not
faithful
The Lord will look after me
Something faithful, but not fluent and nautral
The Lord is for me like somebody who looks after
animals with cotton-like hair

49
What makes a good translation

Translators often talk about two factors we want
to maximize
Faithfulness or fidelity
How close is the meaning of the translation to
the meaning of the original
(Even better does the translation cause the
reader to draw the same inferences as the
original would have)
Fluency or naturalness
How natural the translation is, just considering
its fluency in the target language

50
Statistical MT Systems
Spanish/English Bilingual Text
English Text
Statistical Analysis
Statistical Analysis
Broken English
Spanish
English
What hunger have I, Hungry I am so, I am so
hungry, Have I that hunger
Que hambre tengo yo
I am so hungry
Slide from Kevin Knight
51
Statistical MT Systems
Spanish/English Bilingual Text
English Text
Statistical Analysis
Statistical Analysis
Broken English
Spanish
English
Translation Model P(se)
Language Model P(e)
Que hambre tengo yo
I am so hungry
Decoding algorithm argmax P(e) P(se) e
Slide from Kevin Knight
52
Statistical MT Faithfulness and Fluency
formalized!

Best-translation of a source sentence S
Developed by researchers who were originally in
speech recognition at IBM
Called the IBM model

53
Three Problems for Statistical MT

Language model
Given an English string e, assigns P(e) by
formula
good English string -gt high P(e)
random word sequence -gt low P(e)
Translation model
Given a pair of strings ltf,egt, assigns P(f e)
by formula
ltf,egt look like translations -gt high P(f e)
ltf,egt dont look like translations -gt low P(f
e)
Decoding algorithm
Given a language model, a translation model, and
a new sentence f find translation e maximizing
P(e) P(f e)

Slide from Kevin Knight
54
The IBM model

Hmm, those two factors might look familiar
Yup, its Bayes rule

55
More formally

Assume we are translating from a foreign language
sentence F to an English sentence E
F f1, f2, f3,, fm
We want to find the best English sentence
E-hat e1, e2, e3,, en
E-hat argmaxE P(EF)
argmaxE P(FE)P(E)/P(F)
argmaxE P(FE)P(E)

Translation Model
Language Model
56
The noisy channel model for MT
57
Fluency P(T)

How to measure that this sentence
That car was almost crash onto me
is less fluent than this one
That car almost hit me.
Answer language models (N-grams!)
For example P(hitalmost) gt P(wasalmost)
But can use any other more sophisticated model of
grammar
Advantage this is monolingual knowledge!

58
Faithfulness P(ST)

French ça me plait that me pleases
English
that pleases me - most fluent
I like it
Ill take that one
How to quantify this?
Intuition degree to which words in one sentence
are plausible translations of words in other
sentence
Product of probabilities that each word in target
sentence would generate each word in source
sentence.

59
Faithfulness P(ST)

Need to know, for every target language word,
probability of it mapping to every source
language word.
How do we learn these probabilities?
Parallel texts!
Lots of times we have two texts that are
translations of each other
If we knew which word in Source Text mapped to
each word in Target Text, we could just count!

60
Faithfulness P(ST)

Sentence alignment
Figuring out which source language sentence maps
to which target language sentence
Word alignment
Figuring out which source language word maps to
which target language word

61
Big Point about Faithfulness and Fluency

Job of the faithfulness model P(ST) is just to
model bag of words which words come from say
English to Spanish.
P(ST) doesnt have to worry about internal facts
about Spanish word order thats the job of P(T)
P(T) can do Bag generation put the following
words in order (from Kevin Knight)
have programming a seen never I language better

-actual the hashing is since not collision-free
usually the is less perfectly the of somewhat
capacity table
62
P(T) and bag generation the answer

Usually the actual capacity of the table is
somewhat less, since the hashing is not
collision-free
How about
loves Mary John

63
Three Problems for Statistical MT

Language model
Given an English string e, assigns P(e) by
formula
good English string -gt high P(e)
random word sequence -gt low P(e)
Translation model
Given a pair of strings ltf,egt, assigns P(f e)
by formula
ltf,egt look like translations -gt high P(f e)
ltf,egt dont look like translations -gt low P(f
e)
Decoding algorithm
Given a language model, a translation model, and
a new sentence f find translation e maximizing
P(e) P(f e)

Slide from Kevin Knight
64
The Classic Language ModelWord N-Grams

Goal of the language model -- choose among
He is on the soccer field
He is in the soccer field
Is table the on cup the
The cup is on the table
Rice shrine
American shrine
Rice company
American company

Slide from Kevin Knight
65
Intuition of phrase-based translation (Koehn et
al. 2003)

Generative story has three steps
Group words into phrases
Translate each phrase
Move the phrases around

66
Generative story again

Group English source words into phrases e1, e2,
, en
Translate each English phrase ei into a Spanish
phrase fj.
The probability of doing this is ?(fjei)
Then (optionally) reorder each Spanish phrase
We do this with a distortion probability
A measure of distance between positions of a
corresponding phrase in the 2 lgs.
What is the probability that a phrase in
position X in the English sentences moves to
position Y in the Spanish sentence?

67
Distortion probability

The distortion probability is parameterized by
ai-bi-1
Where ai is the start position of the foreign
(Spanish) phrase generated by the ith English
phrase ei.
And bi-1 is the end position of the foreign
(Spanish) phrase generated by the I-1th English
phrase ei-1.
Well call the distortion probability d(ai-bi-1).
And well have a really stupid model
d(ai-bi-1) ?ai-bi-1
Where ? is some small constant.

68
Final translation model for phrase-based MT

Lets look at a simple example with no distortion

69
Phrase-based MT

Language model P(E)
Translation model P(FE)
Model
How to train the model
Decoder finding the sentence E that is most
probable

70
Training P(FE)

What we mainly need to train is ?(fjei)
Suppose we had a large bilingual training corpus
A bitext
In which each English sentence is paired with a
Spanish sentence
And suppose we knew exactly which phrase in
Spanish was the translation of which phrase in
the English
We call this a phrase alignment
If we had this, we could just count-and-divide

71
But we dont have phrase alignments

What we have instead are word alignments

72
Getting phrase alignments

To get phrase alignments
We first get word alignments
Then we symmetrize the word alignments into
phrase alignments

73
How to get Word Alignments

Word alignment a mapping between the source
words and the target words in a set of parallel
sentences.
Restriction each foreign word comes from exactly
1 English word
Advantage represent an alignment by the index of
the English word that the French word comes from
Alignment above is thus 2,3,4,5,6,6,6

74
One addition spurious words

A word in the foreign sentence
That doesnt align with any word in the English
sentence
Is called a spurious word.
We model these by pretending they are generated
by an English word e0

75
More sophisticated models of alignment
76
Computing word alignments IBM Model 1

For phrase-based machine translation
We want a word-alignment
To extract a set of phrases
A word alignment algorithm gives us P(F,E)
We want this to train our phrase probabilities
?(fjei) as part of P(FE)
But a word-alignment algorithm can also be part
of a mini-translation model itself.

77
IBM Model 1
78
IBM Model 1
79
How does the generative story assign P(FE) for a
Spanish sentence F?

Terminology
Suppose we had done steps 1 and 2, I.e. we
already knew the Spanish length J and the
alignment A (and English source E)

80
Lets formalize steps 1 and 2

We want P(AE) of an alignment A (of length J)
given an English sentence E
IBM Model 1 makes the (very) simplifying
assumption that each alignment is equally likely.
How many possible alignments are there between
English sentence of length I and Spanish sentence
of length J?
Hint Each Spanish word must come from one of the
English source words (or the NULL word)
(I1)J
Lets assume probability of choosing length J is
small constant epsilon

81
Model 1 continued

Prob of choosing a length and then one of the
possible alignments
Combining with step 3
The total probability of a given foreign sentence
F

82
Decoding

How do we find the best A?

83
Training alignment probabilities

Step 1 get a parallel corpus
Hansards
Canadian parliamentary proceedings, in French and
English
Hong Kong Hansards English and Chinese
Step 2 sentence alignment
Step 3 use EM (Expectation Maximization) to
train word alignments

84
Step 1 Parallel corpora

Example from DE-News (8/1/1996)

English German
Diverging opinions about planned tax reform Unterschiedliche Meinungen zur geplanten Steuerreform
The discussion around the envisaged major tax reform continues . Die Diskussion um die vorgesehene grosse Steuerreform dauert an .
The FDP economics expert , Graf Lambsdorff , today came out in favor of advancing the enactment of significant parts of the overhaul , currently planned for 1999 . Der FDP - Wirtschaftsexperte Graf Lambsdorff sprach sich heute dafuer aus , wesentliche Teile der fuer 1999 geplanten Reform vorzuziehen .
Slide from Christof Monz
85
Step 2 Sentence Alignment

The old man is happy. He has fished many times.
His wife talks to him. The fish are jumping.
The sharks await.
Intuition
- use length in words or chars
- together with dynamic programming
- or use a simpler MT model

El viejo está feliz porque ha pescado muchos
veces. Su mujer habla con él. Los tiburones
esperan.
Slide from Kevin Knight
86
Sentence Alignment

The old man is happy.
He has fished many times.
His wife talks to him.
The fish are jumping.
The sharks await.

El viejo está feliz porque ha pescado muchos
veces. Su mujer habla con él. Los tiburones
esperan.
Slide from Kevin Knight
87
Sentence Alignment

The old man is happy.
He has fished many times.
His wife talks to him.
The fish are jumping.
The sharks await.

El viejo está feliz porque ha pescado muchos
veces. Su mujer habla con él. Los tiburones
esperan.
Slide from Kevin Knight
88
Sentence Alignment

The old man is happy. He has fished many times.
His wife talks to him.
The sharks await.

El viejo está feliz porque ha pescado muchos
veces. Su mujer habla con él. Los tiburones
esperan.
Note that unaligned sentences are thrown out,
and sentences are merged in n-to-m alignments (n,
m gt 0).
Slide from Kevin Knight
89
Step 3 word alignments

It turns out we can bootstrap alignments
From a sentence-aligned bilingual corpus
We use is the Expectation-Maximization or EM
algorithm

90
EM for training alignment probs
la maison la maison bleue la fleur
the house the blue house the flower
All word alignments equally likely All
P(french-word english-word) equally likely
Slide from Kevin Knight
91
EM for training alignment probs
la maison la maison bleue la fleur
the house the blue house the flower
la and the observed to co-occur
frequently, so P(la the) is increased.
Slide from Kevin Knight
92
EM for training alignment probs
la maison la maison bleue la fleur
the house the blue house the flower
house co-occurs with both la and maison,
but P(maison house) can be raised without
limit, to 1.0, while P(la house) is limited
because of the (pigeonhole principle)
Slide from Kevin Knight
93
EM for training alignment probs
la maison la maison bleue la fleur
the house the blue house the flower
settling down after another iteration
Slide from Kevin Knight
94
EM for training alignment probs
la maison la maison bleue la fleur
the house the blue house the flower

Inherent hidden structure revealed by EM
training!
For details, see
Section 24.6.1 in the chapter
A Statistical MT Tutorial Workbook (Knight,
1999).
The Mathematics of Statistical Machine
Translation (Brown et al, 1993)
Software GIZA

Slide from Kevin Knight
95
Statistical Machine Translation
la maison la maison bleue la fleur
the house the blue house the flower
P(juste fair) 0.411 P(juste correct)
0.027 P(juste right) 0.020
Possible English translations, to be rescored by
language model
new French sentence
Slide from Kevin Knight
96
A more complex model IBM Model 3Brown et al.,
1993
Generative approach
Mary did not slap the green witch
n(3slap)
Mary not slap slap slap the green witch
P-Null
Mary not slap slap slap NULL the green witch
t(lathe)
Maria no dió una bofetada a la verde bruja
d(ji)
Maria no dió una bofetada a la bruja verde
Probabilities can be learned from raw bilingual
text.
97
How do we evaluate MT? Human tests for fluency

Rating tests Give the raters a scale (1 to 5)
and ask them to rate
Or distinct scales for
Clarity, Naturalness, Style
Or check for specific problems
Cohesion (Lexical chains, anaphora, ellipsis)
Hand-checking for cohesion.
Well-formedness
5-point scale of syntactic correctness
Comprehensibility tests
Noise test
Multiple choice questionnaire
Readability tests
cloze

98
How do we evaluate MT? Human tests for fidelity

Adequacy
Does it convey the information in the original?
Ask raters to rate on a scale
Bilingual raters give them source and target
sentence, ask how much information is preserved
Monolingual raters give them target a good
human translation
Informativeness
Task based is there enough info to do some task?
Give raters multiple-choice questions about
content

99
Evaluating MT Problems

Asking humans to judge sentences on a 5-point
scale for 10 factors takes time and (weeks or
months!)
We cant build language engineering systems if we
can only evaluate them once every quarter!!!!
We need a metric that we can run every time we
change our algorithm.
It would be OK if it wasnt perfect, but just
tended to correlate with the expensive human
metrics, which we could still run in quarterly.

Bonnie Dorr
100
Automatic evaluation

Miller and Beebe-Center (1958)
Assume we have one or more human translations of
the source passage
Compare the automatic translation to these human
translations
Bleu
NIST
Meteor
Precision/Recall

101
BiLingual Evaluation Understudy (BLEU Papineni,
2001)
http//www.research.ibm.com/people/k/kishore/RC221
76.pdf

Automatic Technique, but .
Requires the pre-existence of Human (Reference)
Translations
Approach
Produce corpus of high-quality human translations
Judge closeness numerically (word-error rate)
Compare n-gram matches between candidate
translation and 1 or more reference translations

Slide from Bonnie Dorr
102
BLEU Evaluation Metric (Papineni et al, ACL-2002)
Reference (human) translation The U.S. island
of Guam is maintaining a high state of alert
after the Guam airport and its offices both
received an e-mail from someone calling himself
the Saudi Arabian Osama bin Laden and threatening
a biological/chemical attack against public
places such as the airport .

N-gram precision (score is between 0 1)
What percentage of machine n-grams can be found
in the reference translation?
An n-gram is an sequence of n words
Not allowed to use same portion of reference
translation twice (cant cheat by typing out the
the the the the)
Brevity penalty
Cant just type out single word the (precision
1.0!)
Amazingly hard to game the system (i.e.,
find a way to change machine output so that BLEU
goes up, but quality doesnt)

BLEU4 formula
(counts n-grams up to length 4)
exp (1.0 log p1
0.5 log p2
0.25 log p3
0.125 log p4
max(words-in-reference / words-in-machine
1,
0)
p1 1-gram precision
P2 2-gram precision
P3 3-gram precision
P4 4-gram precision

Machine translation The American ?
international airport and its the office all
receives one calls self the sand Arab rich
business ? and so on electronic mail , which
sends out The threat will be able after public
place and so on the airport to start the
biochemistry attack , ? highly alerts after the
maintenance.
Slide from Bonnie Dorr
104
Multiple Reference Translations
Slide from Bonnie Dorr
105
BLEU in Action
???????? (Foreign Original) the gunman was
shot to death by the police . (Reference
Translation) the gunman was police kill .
1wounded police jaya of 2the gunman
was shot dead by the police . 3the gunman
arrested by police kill . 4the gunmen were
killed . 5the gunman was shot to death by
the police . 6 gunmen were killed by police
?SUBgt0 ?SUBgt0 7 al by the police . 8the
ringer is killed by the police . 9police
killed the gunman . 10
Slide from Bonnie Dorr
106
BLEU in Action
???????? (Foreign Original) the gunman was
shot to death by the police . (Reference
Translation) the gunman was police kill .
1wounded police jaya of 2the gunman
was shot dead by the police . 3the gunman
arrested by police kill . 4the gunmen were
killed . 5the gunman was shot to death by
the police . 6 gunmen were killed by police
?SUBgt0 ?SUBgt0 7 al by the police . 8the
ringer is killed by the police . 9police
killed the gunman . 10
green 4-gram match (good!) red word not
matched (bad!)
Slide from Bonnie Dorr
107
Bleu Comparison
Chinese-English Translation Example Candidate 1
It is a guide to action which ensures that the
military always obeys the commands of the
party. Candidate 2 It is to insure the troops
forever hearing the activity guidebook that party
direct.
Reference 1 It is a guide to action that ensures
that the military will forever heed Party
commands. Reference 2 It is the guiding
principle which guarantees the military forces
always being under the command of the
Party. Reference 3 It is the practical guide for
the army always to heed the directions of the
party.
Slide from Bonnie Dorr
108
How Do We Compute Bleu Scores?

Intuition What percentage of words in candidate
occurred in some human translation?
Proposal count up of candidate translation
words (unigrams) in any reference translation,
divide by the total of words in candidate
translation
But cant just count total of overlapping
N-grams!
Candidate the the the the the the
Reference 1 The cat is on the mat
Solution A reference word should be considered
exhausted after a matching candidate word is
identified.

Slide from Bonnie Dorr
109
Modified n-gram precision

For each word compute
(1) total number of times it occurs in any
single reference translation
(2) number of times it occurs in the candidate
translation
Instead of using count 2, use the minimum of 2
and 2, I.e. clip the counts at the max for the
reference transcription
Now use that modified count.
And divide by number of candidate words.

Slide from Bonnie Dorr
110
Modified Unigram Precision Candidate 1
It(1) is(1) a(1) guide(1) to(1) action(1)
which(1) ensures(1) that(2) the(4) military(1)
always(1) obeys(0) the commands(1) of(1) the
party(1)
Reference 1 It is a guide to action that ensures
that the military will forever heed Party
commands. Reference 2 It is the guiding
principle which guarantees the military forces
always being under the command of the
Party. Reference 3 It is the practical guide for
the army always to heed the directions of the
party.
Whats the answer???
17/18
Slide from Bonnie Dorr
111
Modified Unigram Precision Candidate 2
It(1) is(1) to(1) insure(0) the(4) troops(0)
forever(1) hearing(0) the activity(0)
guidebook(0) that(2) party(1) direct(0)
Reference 1 It is a guide to action that ensures
that the military will forever heed Party
commands. Reference 2 It is the guiding
principle which guarantees the military forces
always being under the command of the
Party. Reference 3 It is the practical guide for
the army always to heed the directions of the
party.
Whats the answer????
8/14
Slide from Bonnie Dorr
112
Modified Bigram Precision Candidate 1
It is(1) is a(1) a guide(1) guide to(1) to
action(1) action which(0) which ensures(0)
ensures that(1) that the(1) the military(1)
military always(0) always obeys(0) obeys the(0)
the commands(0) commands of(0) of the(1) the
party(1)
Reference 1 It is a guide to action that ensures
that the military will forever heed Party
commands. Reference 2 It is the guiding
principle which guarantees the military forces
always being under the command of the
Party. Reference 3 It is the practical guide for
the army always to heed the directions of the
party.
10/17
Whats the answer????
Slide from Bonnie Dorr
113
Modified Bigram Precision Candidate 2
It is(1) is to(0) to insure(0) insure the(0) the
troops(0) troops forever(0) forever hearing(0)
hearing the(0) the activity(0) activity
guidebook(0) guidebook that(0) that party(0)
party direct(0)
Reference 1 It is a guide to action that ensures
that themilitary will forever heed Party
commands. Reference 2 It is the guiding
principle which guarantees the military forces
always being under the command of the
Party. Reference 3 It is the practical guide for
the army always to heed the directions of the
party.
Whats the answer????
1/13
Slide from Bonnie Dorr
114
Catching Cheaters
the(2) the the the(0) the(0) the(0) the(0)
Reference 1 The cat is on the mat Reference 2
There is a cat on the mat
Whats the unigram answer?
2/7
Whats the bigram answer?
0/7
Slide from Bonnie Dorr
115
Bleu distinguishes human from machine translations
Slide from Bonnie Dorr
116
Bleu problems with sentence length

Candidate of the
Solution brevity penalty prefers candidates
translations which are same length as one of the
references

Reference 1 It is a guide to action that ensures
that themilitary will forever heed Party
commands. Reference 2 It is the guiding
principle which guarantees the military forces
always being under the command of the
Party. Reference 3 It is the practical guide for
the army always to heed the directions of the
party.
Problem modified unigram precision is 2/2,
bigram 1/1!
Slide from Bonnie Dorr
117
BLEU Tends to Predict Human Judgments
(variant of BLEU)
slide from G. Doddington (NIST)
118
Summary