Title: CS60057 Speech
1CS60057Speech Natural Language Processing
Lecture 16 3 Sep 2008
2Outline for MT
- Intro and a little history
- Language Similarities and Divergences
- Three classic MT Approaches
- Transfer
- Interlingua
- Direct
- Modern Statistical MT
- Evaluation
3What is MT?
- Translating a text from one language to another
automatically.
4Machine Translation
- dai yu zi zai chuang shang gan nian bao chai you
ting jian chuang wai zhu shao xiang ye zhe shang,
yu sheng xi li, qing han tou mu, bu jue you di
xia lei lai. - Dai-yu alone on bed top think-of-with-gratitude
Bao-chai again listen to window outside bamboo
tip plantain leaf of on-top rain sound sigh drop
clear cold penetrate curtain not feeling again
fall down tears come - As she lay there alone, Dai-yus thoughts turned
to Bao-chai Then she listened to the insistent
rustle of the rain on the bamboos and plantains
outside her window. The coldness penetrated the
curtains of her bed. Almost without noticing it
she had begun to cry.
5Machine Translation
6Machine Translation
- The Story of the Stone
- The Dream of the Red Chamber (Cao Xueqin 1792)
- Issues
- Word segmentation
- Sentence segmentation 4 English sentences to 1
Chinese - Grammatical differences
- Chinese rarely marks tense
- As, turned to, had begun,
- tou -gt penetrated
- Zero anaphora
- No articles
- Stylistic and cultural differences
- Bamboo tip plaintain leaf -gt bamboos and
plantains - Ma curtain -gt curtains of her bed
- Rain sound sigh drop -gt insistent rustle of the
rain
7Not just literature
- Hansards Canadian parliamentary proceeedings
8What is MT not good for?
- Really hard stuff
- Literature
- Natural spoken speech (meetings, court reporting)
- Really important stuff
- Medical translation in hospitals, 911
9What is MT good for?
- Tasks for which a rough translation is fine
- Web pages, email
- Tasks for which MT can be post-edited
- MT as first pass
- Computer-aided human translation
- Tasks in sublanguage domains where high-quality
MT is possible - FAHQT
10Sublanguage domain
- Weather forecasting
- Cloudy with a chance of showers today and
Thursday - Low tonight 4
- Can be modeling completely enough to use raw MT
output - Word classes and semantic features like MONTH,
PLACE, DIRECTION, TIME POINT
11MT History
- 1946 Booth and Weaver discuss MT at Rockefeller
foundation in New York - 1947-48 idea of dictionary-based direct
translation - 1949 Weaver memorandum popularized idea
- 1952 all 18 MT researchers in world meet at MIT
- 1954 IBM/Georgetown Demo Russian-English MT
- 1955-65 lots of labs take up MT
12History of MT Pessimism
- 1959/1960 Bar-Hillel Report on the state of MT
in US and GB - Argued FAHQT too hard (semantic ambiguity, etc)
- Should work on semi-automatic instead of
automatic - His argumentLittle John was looking for his toy
box. Finally, he found it. The box was in the
pen. John was very happy. - Only human knowledge lets us know that
playpens are bigger than boxes, but writing
pens are smaller - His claim we would have to encode all of human
knowledge
13History of MT Pessimism
- The ALPAC report
- Headed by John R. Pierce of Bell Labs
- Conclusions
- Supply of human translators exceeds demand
- All the Soviet literature is already being
translated - MT has been a failure all current MT work had to
be post-edited - Sponsored evaluations which showed that
intelligibility and informativeness was worse
than human translations - Results
- MT research suffered
- Funding loss
- Number of research labs declined
- Association for Machine Translation and
Computational Linguistics dropped MT from its
name
14History of MT
- 1976 Meteo, weather forecasts from English to
French - Systran (Babelfish) been used for 40 years
- 1970s
- European focus in MT mainly ignored in US
- 1980s
- ideas of using AI techniques in MT (KBMT, CMU)
- 1990s
- Commercial MT systems
- Statistical MT
- Speech-to-speech translation
15Language Similarities and Divergences
- Some aspects of human language are universal or
near-universal, others diverge greatly. - Typology the study of systematic
cross-linguistic similarities and differences - What are the dimensions along with human
languages vary?
16Morphological Variation
- Isolating languages
- Cantonese, Vietnamese each word generally has
one morpheme - Vs. Polysynthetic languages
- Siberian Yupik (Eskimo) single word may have
very many morphemes - Agglutinative languages
- Turkish morphemes have clean boundaries
- Vs. Fusion languages
- Russian single affix may have many morphemes
17Syntactic Variation
- SVO (Subject-Verb-Object) languages
- English, German, French, Mandarin
- SOV Languages
- Japanese, Hindi
- VSO languages
- Irish, Classical Arabic
- SVO lgs generally prepositions to Yuriko
- VSO lgs generally postpositions Yuriko ni
18Segmentation Variation
- Not every writing system has word boundaries
marked - Chinese, Japanese, Thai, Vietnamese
- Some languages tend to have sentences that are
quite long, closer to English paragraphs than
sentences - Modern Standard Arabic, Chinese
19Inferential Load cold vs. hot lgs
- Some cold languages require the hearer to do
more figuring out of who the various actors in
the various events are - Japanese, Chinese,
- Other hot languages are pretty explicit about
saying who did what to whom. - English
20Inferential Load (2)
All noun phrases in blue do not appear in Chinese
text But they are needed for a good translation
21Lexical Divergences
- Word to phrases
- English computer science French
informatique - POS divergences
- Eng. she likes/VERB to sing
- Ger. Sie singt gerne/ADV
- Eng Im hungry/ADJ
- Sp. tengo hambre/NOUN
22Lexical Divergences Specificity
- Grammatical constraints
- English has gender on pronouns, Mandarin not.
- So translating 3rd person from Chinese to
English, need to figure out gender of the person! - Similarly from English they to French
ils/elles - Semantic constraints
- English brother
- Mandarin gege (older) versus didi (younger)
- English wall
- German Wand (inside) Mauer (outside)
- German Berg
- English hill or mountain
23Lexical Divergence many-to-many
24Lexical Divergence lexical gaps
- Japanese no word for privacy
- English no word for Cantonese haauseun or
Japanese oyakoko (something like filial
piety) - English cow versus beef, Cantonese ngau
25Event-to-argument divergences
- English
- The bottle floated out.
- Spanish
- La botella salió flotando.
- The bottle exited floating
- Verb-framed lg mark direction of motion on verb
- Spanish, French, Arabic, Hebrew, Japanese, Tamil,
Polynesian, Mayan, Bantu familiies - Satellite-framed lg mark direction of motion on
satellite - Crawl out, float off, jump down, walk over to,
run after - Rest of Indo-European, Hungarian, Finnish, Chinese
26Structural divergences
- G Wir treffen uns am Mittwoch
- E Well meet on Wednesday
27Head Swapping
- E X swim across Y
- S X crucar Y nadando
- E I like to eat
- G Ich esse gern
- E Id prefer vanilla
- G Mir wäre Vanille lieber
28Thematic divergence
- Y me gusto
- I like Y
- G Mir fällt der Termin ein
- E I forget the date
29Divergence counts from Bonnie Dorr
- 32 of sentences in UN Spanish/English Corpus (5K)
Categorial X tener hambre Y have hunger 98
Conflational X dar puñaladas a Z X stab Z 83
Structural X entrar en Y X enter Y 35
Head Swapping X cruzar Y nadando X swim across Y 8
Thematic X gustar a Y Y likes X 6
30MT on the web
- Babelfish
- http//babelfish.altavista.com/
- Google
- http//www.google.com/search?hlenlrclientsafa
rirlsenq"1tazadejugo"28zumo29denaranja
5cucharadasdeazucarmorenabtnGSearch
313 methods for MT
- Direct
- Transfer
- Interlingua
32Three MT Approaches Direct, Transfer,
Interlingual
33Direct Translation
- Proceed word-by-word through text
- Translating each word
- No intermediate structures except morphology
- Knowledge is in the form of
- Huge bilingual dictionary
- word-to-word translation information
- After word translation, can do simple reordering
- Adjective ordering English -gt French/Spanish
34Direct MT Dictionary entry
35Direct MT
36Problems with direct MT
37The Transfer Model
- Idea apply contrastive knowledge, i.e.,
knowledge about the difference between two
languages - Steps
- Analysis Syntactically parse Source language
- Transfer Rules to turn this parse into parse for
Target language - Generation Generate Target sentence from parse
tree
38English to French
- Generally
- English Adjective Noun
- French Noun Adjective
- Note not always true
- Route mauvaise bad road, badly-paved road
- Mauvaise route wrong road)
- But is a reasonable first approximation
- Rule
39Transfer rules
40Lexical transfer
- Transfer-based systems also need lexical transfer
rules - Bilingual dictionary (like for direct MT)
- English home
- German
- nach Hause (going home)
- Heim (home game)
- Heimat (homeland, home country)
- zu Hause (at home)
- Can list at home lt-gt zu Hause
- Or do Word Sense Disambiguation
41Systran combining direct and transfer
- Analysis
- Morphological analysis, POS tagging
- Chunking of NPs, PPs, phrases
- Shallow dependency parsing
- Transfer
- Translation of idioms
- Word sense disambiguation
- Assigning prepositions based on governing verbs
- Synthesis
- Apply rich bilingual dictionary
- Deal with reordering
- Morphological generation
42Transfer some problems
- N2 sets of transfer rules!
- Grammar and lexicon full of language-specific
stuff - Hard to build, hard to maintain
43Interlingua
- Intuition Instead of lg-lg knowledge rules, use
the meaning of the sentence to help - Steps
- 1) translate source sentence into meaning
representation - 2) generate target sentence from meaning.
44Interlingua for Mary did not slap the green witch
45Interlingua
- Idea is that some of the MT work that we need to
do is part of other NLP tasks - E.g., disambiguating Ebook Slibro from Ebook
Sreservar - So we could have concepts like BOOKVOLUME and
RESERVE and solve this problem once for each
language
46Direct MT pros and cons (Bonnie Dorr)
- Pros
- Fast
- Simple
- Cheap
- No translation rules hidden in lexicon
- Cons
- Unreliable
- Not powerful
- Rule proliferation
- Requires lots of context
- Major restructuring after lexical substitution
47Interlingual MT pros and cons (B. Dorr)
- Pros
- Avoids the N2 problem
- Easier to write rules
- Cons
- Semantics is HARD
- Useful information lost (paraphrase)
48The impossibility of translation
- Hebrew adonoi roi for a culture without sheep
or shepherds - Something fluent and understandable, but not
faithful - The Lord will look after me
- Something faithful, but not fluent and nautral
- The Lord is for me like somebody who looks after
animals with cotton-like hair
49What makes a good translation
- Translators often talk about two factors we want
to maximize - Faithfulness or fidelity
- How close is the meaning of the translation to
the meaning of the original - (Even better does the translation cause the
reader to draw the same inferences as the
original would have) - Fluency or naturalness
- How natural the translation is, just considering
its fluency in the target language
50Statistical MT Systems
Spanish/English Bilingual Text
English Text
Statistical Analysis
Statistical Analysis
Broken English
Spanish
English
What hunger have I, Hungry I am so, I am so
hungry, Have I that hunger
Que hambre tengo yo
I am so hungry
Slide from Kevin Knight
51Statistical MT Systems
Spanish/English Bilingual Text
English Text
Statistical Analysis
Statistical Analysis
Broken English
Spanish
English
Translation Model P(se)
Language Model P(e)
Que hambre tengo yo
I am so hungry
Decoding algorithm argmax P(e) P(se) e
Slide from Kevin Knight
52Statistical MT Faithfulness and Fluency
formalized!
- Best-translation of a source sentence S
- Developed by researchers who were originally in
speech recognition at IBM - Called the IBM model
53Three Problems for Statistical MT
- Language model
- Given an English string e, assigns P(e) by
formula - good English string -gt high P(e)
- random word sequence -gt low P(e)
- Translation model
- Given a pair of strings ltf,egt, assigns P(f e)
by formula - ltf,egt look like translations -gt high P(f e)
- ltf,egt dont look like translations -gt low P(f
e) - Decoding algorithm
- Given a language model, a translation model, and
a new sentence f find translation e maximizing
P(e) P(f e)
Slide from Kevin Knight
54The IBM model
- Hmm, those two factors might look familiar
- Yup, its Bayes rule
55More formally
- Assume we are translating from a foreign language
sentence F to an English sentence E - F f1, f2, f3,, fm
- We want to find the best English sentence
- E-hat e1, e2, e3,, en
- E-hat argmaxE P(EF)
- argmaxE P(FE)P(E)/P(F)
- argmaxE P(FE)P(E)
Translation Model
Language Model
56The noisy channel model for MT
57Fluency P(T)
- How to measure that this sentence
- That car was almost crash onto me
- is less fluent than this one
- That car almost hit me.
- Answer language models (N-grams!)
- For example P(hitalmost) gt P(wasalmost)
- But can use any other more sophisticated model of
grammar - Advantage this is monolingual knowledge!
58Faithfulness P(ST)
- French ça me plait that me pleases
- English
- that pleases me - most fluent
- I like it
- Ill take that one
- How to quantify this?
- Intuition degree to which words in one sentence
are plausible translations of words in other
sentence - Product of probabilities that each word in target
sentence would generate each word in source
sentence.
59Faithfulness P(ST)
- Need to know, for every target language word,
probability of it mapping to every source
language word. - How do we learn these probabilities?
- Parallel texts!
- Lots of times we have two texts that are
translations of each other - If we knew which word in Source Text mapped to
each word in Target Text, we could just count!
60Faithfulness P(ST)
- Sentence alignment
- Figuring out which source language sentence maps
to which target language sentence - Word alignment
- Figuring out which source language word maps to
which target language word
61Big Point about Faithfulness and Fluency
- Job of the faithfulness model P(ST) is just to
model bag of words which words come from say
English to Spanish. - P(ST) doesnt have to worry about internal facts
about Spanish word order thats the job of P(T) - P(T) can do Bag generation put the following
words in order (from Kevin Knight) - have programming a seen never I language better
-actual the hashing is since not collision-free
usually the is less perfectly the of somewhat
capacity table
62P(T) and bag generation the answer
- Usually the actual capacity of the table is
somewhat less, since the hashing is not
collision-free - How about
- loves Mary John
63Three Problems for Statistical MT
- Language model
- Given an English string e, assigns P(e) by
formula - good English string -gt high P(e)
- random word sequence -gt low P(e)
- Translation model
- Given a pair of strings ltf,egt, assigns P(f e)
by formula - ltf,egt look like translations -gt high P(f e)
- ltf,egt dont look like translations -gt low P(f
e) - Decoding algorithm
- Given a language model, a translation model, and
a new sentence f find translation e maximizing
P(e) P(f e)
Slide from Kevin Knight
64The Classic Language ModelWord N-Grams
- Goal of the language model -- choose among
- He is on the soccer field
- He is in the soccer field
- Is table the on cup the
- The cup is on the table
- Rice shrine
- American shrine
- Rice company
- American company
Slide from Kevin Knight
65Intuition of phrase-based translation (Koehn et
al. 2003)
- Generative story has three steps
- Group words into phrases
- Translate each phrase
- Move the phrases around
66Generative story again
- Group English source words into phrases e1, e2,
, en - Translate each English phrase ei into a Spanish
phrase fj. - The probability of doing this is ?(fjei)
- Then (optionally) reorder each Spanish phrase
- We do this with a distortion probability
- A measure of distance between positions of a
corresponding phrase in the 2 lgs. - What is the probability that a phrase in
position X in the English sentences moves to
position Y in the Spanish sentence?
67Distortion probability
- The distortion probability is parameterized by
- ai-bi-1
- Where ai is the start position of the foreign
(Spanish) phrase generated by the ith English
phrase ei. - And bi-1 is the end position of the foreign
(Spanish) phrase generated by the I-1th English
phrase ei-1. - Well call the distortion probability d(ai-bi-1).
- And well have a really stupid model
- d(ai-bi-1) ?ai-bi-1
- Where ? is some small constant.
68Final translation model for phrase-based MT
- Lets look at a simple example with no distortion
69Phrase-based MT
- Language model P(E)
- Translation model P(FE)
- Model
- How to train the model
- Decoder finding the sentence E that is most
probable
70Training P(FE)
- What we mainly need to train is ?(fjei)
- Suppose we had a large bilingual training corpus
- A bitext
- In which each English sentence is paired with a
Spanish sentence - And suppose we knew exactly which phrase in
Spanish was the translation of which phrase in
the English - We call this a phrase alignment
- If we had this, we could just count-and-divide
71But we dont have phrase alignments
- What we have instead are word alignments
72Getting phrase alignments
- To get phrase alignments
- We first get word alignments
- Then we symmetrize the word alignments into
phrase alignments
73How to get Word Alignments
- Word alignment a mapping between the source
words and the target words in a set of parallel
sentences. - Restriction each foreign word comes from exactly
1 English word - Advantage represent an alignment by the index of
the English word that the French word comes from - Alignment above is thus 2,3,4,5,6,6,6
74One addition spurious words
- A word in the foreign sentence
- That doesnt align with any word in the English
sentence - Is called a spurious word.
- We model these by pretending they are generated
by an English word e0
75More sophisticated models of alignment
76Computing word alignments IBM Model 1
- For phrase-based machine translation
- We want a word-alignment
- To extract a set of phrases
- A word alignment algorithm gives us P(F,E)
- We want this to train our phrase probabilities
?(fjei) as part of P(FE) - But a word-alignment algorithm can also be part
of a mini-translation model itself.
77IBM Model 1
78IBM Model 1
79How does the generative story assign P(FE) for a
Spanish sentence F?
- Terminology
- Suppose we had done steps 1 and 2, I.e. we
already knew the Spanish length J and the
alignment A (and English source E)
80Lets formalize steps 1 and 2
- We want P(AE) of an alignment A (of length J)
given an English sentence E - IBM Model 1 makes the (very) simplifying
assumption that each alignment is equally likely. - How many possible alignments are there between
English sentence of length I and Spanish sentence
of length J? - Hint Each Spanish word must come from one of the
English source words (or the NULL word) - (I1)J
- Lets assume probability of choosing length J is
small constant epsilon
81Model 1 continued
- Prob of choosing a length and then one of the
possible alignments - Combining with step 3
- The total probability of a given foreign sentence
F
82Decoding
- How do we find the best A?
83Training alignment probabilities
- Step 1 get a parallel corpus
- Hansards
- Canadian parliamentary proceedings, in French and
English - Hong Kong Hansards English and Chinese
- Step 2 sentence alignment
- Step 3 use EM (Expectation Maximization) to
train word alignments
84Step 1 Parallel corpora
- Example from DE-News (8/1/1996)
English German
Diverging opinions about planned tax reform Unterschiedliche Meinungen zur geplanten Steuerreform
The discussion around the envisaged major tax reform continues . Die Diskussion um die vorgesehene grosse Steuerreform dauert an .
The FDP economics expert , Graf Lambsdorff , today came out in favor of advancing the enactment of significant parts of the overhaul , currently planned for 1999 . Der FDP - Wirtschaftsexperte Graf Lambsdorff sprach sich heute dafuer aus , wesentliche Teile der fuer 1999 geplanten Reform vorzuziehen .
Slide from Christof Monz
85Step 2 Sentence Alignment
- The old man is happy. He has fished many times.
His wife talks to him. The fish are jumping.
The sharks await. - Intuition
- - use length in words or chars
- - together with dynamic programming
- - or use a simpler MT model
El viejo está feliz porque ha pescado muchos
veces. Su mujer habla con él. Los tiburones
esperan.
Slide from Kevin Knight
86Sentence Alignment
- The old man is happy.
- He has fished many times.
- His wife talks to him.
- The fish are jumping.
- The sharks await.
El viejo está feliz porque ha pescado muchos
veces. Su mujer habla con él. Los tiburones
esperan.
Slide from Kevin Knight
87Sentence Alignment
- The old man is happy.
- He has fished many times.
- His wife talks to him.
- The fish are jumping.
- The sharks await.
El viejo está feliz porque ha pescado muchos
veces. Su mujer habla con él. Los tiburones
esperan.
Slide from Kevin Knight
88Sentence Alignment
- The old man is happy. He has fished many times.
- His wife talks to him.
- The sharks await.
El viejo está feliz porque ha pescado muchos
veces. Su mujer habla con él. Los tiburones
esperan.
Note that unaligned sentences are thrown out,
and sentences are merged in n-to-m alignments (n,
m gt 0).
Slide from Kevin Knight
89Step 3 word alignments
- It turns out we can bootstrap alignments
- From a sentence-aligned bilingual corpus
- We use is the Expectation-Maximization or EM
algorithm
90EM for training alignment probs
la maison la maison bleue la fleur
the house the blue house the flower
All word alignments equally likely All
P(french-word english-word) equally likely
Slide from Kevin Knight
91EM for training alignment probs
la maison la maison bleue la fleur
the house the blue house the flower
la and the observed to co-occur
frequently, so P(la the) is increased.
Slide from Kevin Knight
92EM for training alignment probs
la maison la maison bleue la fleur
the house the blue house the flower
house co-occurs with both la and maison,
but P(maison house) can be raised without
limit, to 1.0, while P(la house) is limited
because of the (pigeonhole principle)
Slide from Kevin Knight
93EM for training alignment probs
la maison la maison bleue la fleur
the house the blue house the flower
settling down after another iteration
Slide from Kevin Knight
94EM for training alignment probs
la maison la maison bleue la fleur
the house the blue house the flower
- Inherent hidden structure revealed by EM
training! - For details, see
- Section 24.6.1 in the chapter
- A Statistical MT Tutorial Workbook (Knight,
1999). - The Mathematics of Statistical Machine
Translation (Brown et al, 1993) - Software GIZA
Slide from Kevin Knight
95Statistical Machine Translation
la maison la maison bleue la fleur
the house the blue house the flower
P(juste fair) 0.411 P(juste correct)
0.027 P(juste right) 0.020
Possible English translations, to be rescored by
language model
new French sentence
Slide from Kevin Knight
96A more complex model IBM Model 3Brown et al.,
1993
Generative approach
Mary did not slap the green witch
n(3slap)
Mary not slap slap slap the green witch
P-Null
Mary not slap slap slap NULL the green witch
t(lathe)
Maria no dió una bofetada a la verde bruja
d(ji)
Maria no dió una bofetada a la bruja verde
Probabilities can be learned from raw bilingual
text.
97How do we evaluate MT? Human tests for fluency
- Rating tests Give the raters a scale (1 to 5)
and ask them to rate - Or distinct scales for
- Clarity, Naturalness, Style
- Or check for specific problems
- Cohesion (Lexical chains, anaphora, ellipsis)
- Hand-checking for cohesion.
- Well-formedness
- 5-point scale of syntactic correctness
- Comprehensibility tests
- Noise test
- Multiple choice questionnaire
- Readability tests
- cloze
98How do we evaluate MT? Human tests for fidelity
- Adequacy
- Does it convey the information in the original?
- Ask raters to rate on a scale
- Bilingual raters give them source and target
sentence, ask how much information is preserved - Monolingual raters give them target a good
human translation - Informativeness
- Task based is there enough info to do some task?
- Give raters multiple-choice questions about
content
99Evaluating MT Problems
- Asking humans to judge sentences on a 5-point
scale for 10 factors takes time and (weeks or
months!) - We cant build language engineering systems if we
can only evaluate them once every quarter!!!! - We need a metric that we can run every time we
change our algorithm. - It would be OK if it wasnt perfect, but just
tended to correlate with the expensive human
metrics, which we could still run in quarterly.
Bonnie Dorr
100Automatic evaluation
- Miller and Beebe-Center (1958)
- Assume we have one or more human translations of
the source passage - Compare the automatic translation to these human
translations - Bleu
- NIST
- Meteor
- Precision/Recall
101BiLingual Evaluation Understudy (BLEU Papineni,
2001)
http//www.research.ibm.com/people/k/kishore/RC221
76.pdf
- Automatic Technique, but .
- Requires the pre-existence of Human (Reference)
Translations - Approach
- Produce corpus of high-quality human translations
- Judge closeness numerically (word-error rate)
- Compare n-gram matches between candidate
translation and 1 or more reference translations
Slide from Bonnie Dorr
102BLEU Evaluation Metric (Papineni et al, ACL-2002)
Reference (human) translation The U.S. island
of Guam is maintaining a high state of alert
after the Guam airport and its offices both
received an e-mail from someone calling himself
the Saudi Arabian Osama bin Laden and threatening
a biological/chemical attack against public
places such as the airport .
- N-gram precision (score is between 0 1)
- What percentage of machine n-grams can be found
in the reference translation? - An n-gram is an sequence of n words
- Not allowed to use same portion of reference
translation twice (cant cheat by typing out the
the the the the) - Brevity penalty
- Cant just type out single word the (precision
1.0!) - Amazingly hard to game the system (i.e.,
find a way to change machine output so that BLEU
goes up, but quality doesnt)
Machine translation The American ?
international airport and its the office all
receives one calls self the sand Arab rich
business ? and so on electronic mail , which
sends out The threat will be able after public
place and so on the airport to start the
biochemistry attack , ? highly alerts after the
maintenance.
Slide from Bonnie Dorr
103BLEU Evaluation Metric (Papineni et al, ACL-2002)
Reference (human) translation The U.S. island
of Guam is maintaining a high state of alert
after the Guam airport and its offices both
received an e-mail from someone calling himself
the Saudi Arabian Osama bin Laden and threatening
a biological/chemical attack against public
places such as the airport .
- BLEU4 formula
- (counts n-grams up to length 4)
- exp (1.0 log p1
- 0.5 log p2
- 0.25 log p3
- 0.125 log p4
- max(words-in-reference / words-in-machine
1, - 0)
- p1 1-gram precision
- P2 2-gram precision
- P3 3-gram precision
- P4 4-gram precision
Machine translation The American ?
international airport and its the office all
receives one calls self the sand Arab rich
business ? and so on electronic mail , which
sends out The threat will be able after public
place and so on the airport to start the
biochemistry attack , ? highly alerts after the
maintenance.
Slide from Bonnie Dorr
104Multiple Reference Translations
Slide from Bonnie Dorr
105BLEU in Action
???????? (Foreign Original) the gunman was
shot to death by the police . (Reference
Translation) the gunman was police kill .
1wounded police jaya of 2the gunman
was shot dead by the police . 3the gunman
arrested by police kill . 4the gunmen were
killed . 5the gunman was shot to death by
the police . 6 gunmen were killed by police
?SUBgt0 ?SUBgt0 7 al by the police . 8the
ringer is killed by the police . 9police
killed the gunman . 10
Slide from Bonnie Dorr
106BLEU in Action
???????? (Foreign Original) the gunman was
shot to death by the police . (Reference
Translation) the gunman was police kill .
1wounded police jaya of 2the gunman
was shot dead by the police . 3the gunman
arrested by police kill . 4the gunmen were
killed . 5the gunman was shot to death by
the police . 6 gunmen were killed by police
?SUBgt0 ?SUBgt0 7 al by the police . 8the
ringer is killed by the police . 9police
killed the gunman . 10
green 4-gram match (good!) red word not
matched (bad!)
Slide from Bonnie Dorr
107Bleu Comparison
Chinese-English Translation Example Candidate 1
It is a guide to action which ensures that the
military always obeys the commands of the
party. Candidate 2 It is to insure the troops
forever hearing the activity guidebook that party
direct.
Reference 1 It is a guide to action that ensures
that the military will forever heed Party
commands. Reference 2 It is the guiding
principle which guarantees the military forces
always being under the command of the
Party. Reference 3 It is the practical guide for
the army always to heed the directions of the
party.
Slide from Bonnie Dorr
108How Do We Compute Bleu Scores?
- Intuition What percentage of words in candidate
occurred in some human translation? - Proposal count up of candidate translation
words (unigrams) in any reference translation,
divide by the total of words in candidate
translation - But cant just count total of overlapping
N-grams! - Candidate the the the the the the
- Reference 1 The cat is on the mat
- Solution A reference word should be considered
exhausted after a matching candidate word is
identified.
Slide from Bonnie Dorr
109Modified n-gram precision
- For each word compute
- (1) total number of times it occurs in any
single reference translation - (2) number of times it occurs in the candidate
translation - Instead of using count 2, use the minimum of 2
and 2, I.e. clip the counts at the max for the
reference transcription - Now use that modified count.
- And divide by number of candidate words.
Slide from Bonnie Dorr
110Modified Unigram Precision Candidate 1
It(1) is(1) a(1) guide(1) to(1) action(1)
which(1) ensures(1) that(2) the(4) military(1)
always(1) obeys(0) the commands(1) of(1) the
party(1)
Reference 1 It is a guide to action that ensures
that the military will forever heed Party
commands. Reference 2 It is the guiding
principle which guarantees the military forces
always being under the command of the
Party. Reference 3 It is the practical guide for
the army always to heed the directions of the
party.
Whats the answer???
17/18
Slide from Bonnie Dorr
111Modified Unigram Precision Candidate 2
It(1) is(1) to(1) insure(0) the(4) troops(0)
forever(1) hearing(0) the activity(0)
guidebook(0) that(2) party(1) direct(0)
Reference 1 It is a guide to action that ensures
that the military will forever heed Party
commands. Reference 2 It is the guiding
principle which guarantees the military forces
always being under the command of the
Party. Reference 3 It is the practical guide for
the army always to heed the directions of the
party.
Whats the answer????
8/14
Slide from Bonnie Dorr
112Modified Bigram Precision Candidate 1
It is(1) is a(1) a guide(1) guide to(1) to
action(1) action which(0) which ensures(0)
ensures that(1) that the(1) the military(1)
military always(0) always obeys(0) obeys the(0)
the commands(0) commands of(0) of the(1) the
party(1)
Reference 1 It is a guide to action that ensures
that the military will forever heed Party
commands. Reference 2 It is the guiding
principle which guarantees the military forces
always being under the command of the
Party. Reference 3 It is the practical guide for
the army always to heed the directions of the
party.
10/17
Whats the answer????
Slide from Bonnie Dorr
113Modified Bigram Precision Candidate 2
It is(1) is to(0) to insure(0) insure the(0) the
troops(0) troops forever(0) forever hearing(0)
hearing the(0) the activity(0) activity
guidebook(0) guidebook that(0) that party(0)
party direct(0)
Reference 1 It is a guide to action that ensures
that themilitary will forever heed Party
commands. Reference 2 It is the guiding
principle which guarantees the military forces
always being under the command of the
Party. Reference 3 It is the practical guide for
the army always to heed the directions of the
party.
Whats the answer????
1/13
Slide from Bonnie Dorr
114Catching Cheaters
the(2) the the the(0) the(0) the(0) the(0)
Reference 1 The cat is on the mat Reference 2
There is a cat on the mat
Whats the unigram answer?
2/7
Whats the bigram answer?
0/7
Slide from Bonnie Dorr
115Bleu distinguishes human from machine translations
Slide from Bonnie Dorr
116Bleu problems with sentence length
- Candidate of the
- Solution brevity penalty prefers candidates
translations which are same length as one of the
references
Reference 1 It is a guide to action that ensures
that themilitary will forever heed Party
commands. Reference 2 It is the guiding
principle which guarantees the military forces
always being under the command of the
Party. Reference 3 It is the practical guide for
the army always to heed the directions of the
party.
Problem modified unigram precision is 2/2,
bigram 1/1!
Slide from Bonnie Dorr
117BLEU Tends to Predict Human Judgments
(variant of BLEU)
slide from G. Doddington (NIST)
118Summary
- Intro and a little history
- Language Similarities and Divergences
- Four main MT Approaches
- Transfer
- Interlingua
- Direct
- Statistical
- Evaluation
119Classes
- LINGUIST 139M/239M. Human and Machine
Translation. (Martin Kay) - CS 224N. Natural Language Processing (Chris
Manning)