CS60057 Speech - PowerPoint PPT Presentation

1 / 117
About This Presentation
Title:

CS60057 Speech

Description:

dai yu zi zai chuang shang gan nian bao chai you ting jian ... ideas of using AI techniques in MT (KBMT, CMU) 1990's. Commercial MT systems. Statistical MT ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 118
Provided by: IBMU306
Category:
Tags: cs60057 | kbmt | speech

less

Transcript and Presenter's Notes

Title: CS60057 Speech


1
CS60057Speech Natural Language Processing
  • Autumn 2008

Lecture 16 3 Sep 2008
2
Outline for MT
  • Intro and a little history
  • Language Similarities and Divergences
  • Three classic MT Approaches
  • Transfer
  • Interlingua
  • Direct
  • Modern Statistical MT
  • Evaluation

3
What is MT?
  • Translating a text from one language to another
    automatically.

4
Machine Translation
  • dai yu zi zai chuang shang gan nian bao chai you
    ting jian chuang wai zhu shao xiang ye zhe shang,
    yu sheng xi li, qing han tou mu, bu jue you di
    xia lei lai.
  • Dai-yu alone on bed top think-of-with-gratitude
    Bao-chai again listen to window outside bamboo
    tip plantain leaf of on-top rain sound sigh drop
    clear cold penetrate curtain not feeling again
    fall down tears come
  • As she lay there alone, Dai-yus thoughts turned
    to Bao-chai Then she listened to the insistent
    rustle of the rain on the bamboos and plantains
    outside her window. The coldness penetrated the
    curtains of her bed. Almost without noticing it
    she had begun to cry.

5
Machine Translation
6
Machine Translation
  • The Story of the Stone
  • The Dream of the Red Chamber (Cao Xueqin 1792)
  • Issues
  • Word segmentation
  • Sentence segmentation 4 English sentences to 1
    Chinese
  • Grammatical differences
  • Chinese rarely marks tense
  • As, turned to, had begun,
  • tou -gt penetrated
  • Zero anaphora
  • No articles
  • Stylistic and cultural differences
  • Bamboo tip plaintain leaf -gt bamboos and
    plantains
  • Ma curtain -gt curtains of her bed
  • Rain sound sigh drop -gt insistent rustle of the
    rain

7
Not just literature
  • Hansards Canadian parliamentary proceeedings

8
What is MT not good for?
  • Really hard stuff
  • Literature
  • Natural spoken speech (meetings, court reporting)
  • Really important stuff
  • Medical translation in hospitals, 911

9
What is MT good for?
  • Tasks for which a rough translation is fine
  • Web pages, email
  • Tasks for which MT can be post-edited
  • MT as first pass
  • Computer-aided human translation
  • Tasks in sublanguage domains where high-quality
    MT is possible
  • FAHQT

10
Sublanguage domain
  • Weather forecasting
  • Cloudy with a chance of showers today and
    Thursday
  • Low tonight 4
  • Can be modeling completely enough to use raw MT
    output
  • Word classes and semantic features like MONTH,
    PLACE, DIRECTION, TIME POINT

11
MT History
  • 1946 Booth and Weaver discuss MT at Rockefeller
    foundation in New York
  • 1947-48 idea of dictionary-based direct
    translation
  • 1949 Weaver memorandum popularized idea
  • 1952 all 18 MT researchers in world meet at MIT
  • 1954 IBM/Georgetown Demo Russian-English MT
  • 1955-65 lots of labs take up MT

12
History of MT Pessimism
  • 1959/1960 Bar-Hillel Report on the state of MT
    in US and GB
  • Argued FAHQT too hard (semantic ambiguity, etc)
  • Should work on semi-automatic instead of
    automatic
  • His argumentLittle John was looking for his toy
    box. Finally, he found it. The box was in the
    pen. John was very happy.
  • Only human knowledge lets us know that
    playpens are bigger than boxes, but writing
    pens are smaller
  • His claim we would have to encode all of human
    knowledge

13
History of MT Pessimism
  • The ALPAC report
  • Headed by John R. Pierce of Bell Labs
  • Conclusions
  • Supply of human translators exceeds demand
  • All the Soviet literature is already being
    translated
  • MT has been a failure all current MT work had to
    be post-edited
  • Sponsored evaluations which showed that
    intelligibility and informativeness was worse
    than human translations
  • Results
  • MT research suffered
  • Funding loss
  • Number of research labs declined
  • Association for Machine Translation and
    Computational Linguistics dropped MT from its
    name

14
History of MT
  • 1976 Meteo, weather forecasts from English to
    French
  • Systran (Babelfish) been used for 40 years
  • 1970s
  • European focus in MT mainly ignored in US
  • 1980s
  • ideas of using AI techniques in MT (KBMT, CMU)
  • 1990s
  • Commercial MT systems
  • Statistical MT
  • Speech-to-speech translation

15
Language Similarities and Divergences
  • Some aspects of human language are universal or
    near-universal, others diverge greatly.
  • Typology the study of systematic
    cross-linguistic similarities and differences
  • What are the dimensions along with human
    languages vary?

16
Morphological Variation
  • Isolating languages
  • Cantonese, Vietnamese each word generally has
    one morpheme
  • Vs. Polysynthetic languages
  • Siberian Yupik (Eskimo) single word may have
    very many morphemes
  • Agglutinative languages
  • Turkish morphemes have clean boundaries
  • Vs. Fusion languages
  • Russian single affix may have many morphemes

17
Syntactic Variation
  • SVO (Subject-Verb-Object) languages
  • English, German, French, Mandarin
  • SOV Languages
  • Japanese, Hindi
  • VSO languages
  • Irish, Classical Arabic
  • SVO lgs generally prepositions to Yuriko
  • VSO lgs generally postpositions Yuriko ni

18
Segmentation Variation
  • Not every writing system has word boundaries
    marked
  • Chinese, Japanese, Thai, Vietnamese
  • Some languages tend to have sentences that are
    quite long, closer to English paragraphs than
    sentences
  • Modern Standard Arabic, Chinese

19
Inferential Load cold vs. hot lgs
  • Some cold languages require the hearer to do
    more figuring out of who the various actors in
    the various events are
  • Japanese, Chinese,
  • Other hot languages are pretty explicit about
    saying who did what to whom.
  • English

20
Inferential Load (2)
All noun phrases in blue do not appear in Chinese
text But they are needed for a good translation
21
Lexical Divergences
  • Word to phrases
  • English computer science French
    informatique
  • POS divergences
  • Eng. she likes/VERB to sing
  • Ger. Sie singt gerne/ADV
  • Eng Im hungry/ADJ
  • Sp. tengo hambre/NOUN

22
Lexical Divergences Specificity
  • Grammatical constraints
  • English has gender on pronouns, Mandarin not.
  • So translating 3rd person from Chinese to
    English, need to figure out gender of the person!
  • Similarly from English they to French
    ils/elles
  • Semantic constraints
  • English brother
  • Mandarin gege (older) versus didi (younger)
  • English wall
  • German Wand (inside) Mauer (outside)
  • German Berg
  • English hill or mountain

23
Lexical Divergence many-to-many
24
Lexical Divergence lexical gaps
  • Japanese no word for privacy
  • English no word for Cantonese haauseun or
    Japanese oyakoko (something like filial
    piety)
  • English cow versus beef, Cantonese ngau

25
Event-to-argument divergences
  • English
  • The bottle floated out.
  • Spanish
  • La botella salió flotando.
  • The bottle exited floating
  • Verb-framed lg mark direction of motion on verb
  • Spanish, French, Arabic, Hebrew, Japanese, Tamil,
    Polynesian, Mayan, Bantu familiies
  • Satellite-framed lg mark direction of motion on
    satellite
  • Crawl out, float off, jump down, walk over to,
    run after
  • Rest of Indo-European, Hungarian, Finnish, Chinese

26
Structural divergences
  • G Wir treffen uns am Mittwoch
  • E Well meet on Wednesday

27
Head Swapping
  • E X swim across Y
  • S X crucar Y nadando
  • E I like to eat
  • G Ich esse gern
  • E Id prefer vanilla
  • G Mir wäre Vanille lieber

28
Thematic divergence
  • Y me gusto
  • I like Y
  • G Mir fällt der Termin ein
  • E I forget the date

29
Divergence counts from Bonnie Dorr
  • 32 of sentences in UN Spanish/English Corpus (5K)

Categorial X tener hambre Y have hunger 98
Conflational X dar puñaladas a Z X stab Z 83
Structural X entrar en Y X enter Y 35
Head Swapping X cruzar Y nadando X swim across Y 8
Thematic X gustar a Y Y likes X 6
30
MT on the web
  • Babelfish
  • http//babelfish.altavista.com/
  • Google
  • http//www.google.com/search?hlenlrclientsafa
    rirlsenq"1tazadejugo"28zumo29denaranja
    5cucharadasdeazucarmorenabtnGSearch

31
3 methods for MT
  • Direct
  • Transfer
  • Interlingua

32
Three MT Approaches Direct, Transfer,
Interlingual
33
Direct Translation
  • Proceed word-by-word through text
  • Translating each word
  • No intermediate structures except morphology
  • Knowledge is in the form of
  • Huge bilingual dictionary
  • word-to-word translation information
  • After word translation, can do simple reordering
  • Adjective ordering English -gt French/Spanish

34
Direct MT Dictionary entry
35
Direct MT
36
Problems with direct MT
  • German
  • Chinese

37
The Transfer Model
  • Idea apply contrastive knowledge, i.e.,
    knowledge about the difference between two
    languages
  • Steps
  • Analysis Syntactically parse Source language
  • Transfer Rules to turn this parse into parse for
    Target language
  • Generation Generate Target sentence from parse
    tree

38
English to French
  • Generally
  • English Adjective Noun
  • French Noun Adjective
  • Note not always true
  • Route mauvaise bad road, badly-paved road
  • Mauvaise route wrong road)
  • But is a reasonable first approximation
  • Rule

39
Transfer rules
40
Lexical transfer
  • Transfer-based systems also need lexical transfer
    rules
  • Bilingual dictionary (like for direct MT)
  • English home
  • German
  • nach Hause (going home)
  • Heim (home game)
  • Heimat (homeland, home country)
  • zu Hause (at home)
  • Can list at home lt-gt zu Hause
  • Or do Word Sense Disambiguation

41
Systran combining direct and transfer
  • Analysis
  • Morphological analysis, POS tagging
  • Chunking of NPs, PPs, phrases
  • Shallow dependency parsing
  • Transfer
  • Translation of idioms
  • Word sense disambiguation
  • Assigning prepositions based on governing verbs
  • Synthesis
  • Apply rich bilingual dictionary
  • Deal with reordering
  • Morphological generation

42
Transfer some problems
  • N2 sets of transfer rules!
  • Grammar and lexicon full of language-specific
    stuff
  • Hard to build, hard to maintain

43
Interlingua
  • Intuition Instead of lg-lg knowledge rules, use
    the meaning of the sentence to help
  • Steps
  • 1) translate source sentence into meaning
    representation
  • 2) generate target sentence from meaning.

44
Interlingua for Mary did not slap the green witch
45
Interlingua
  • Idea is that some of the MT work that we need to
    do is part of other NLP tasks
  • E.g., disambiguating Ebook Slibro from Ebook
    Sreservar
  • So we could have concepts like BOOKVOLUME and
    RESERVE and solve this problem once for each
    language

46
Direct MT pros and cons (Bonnie Dorr)
  • Pros
  • Fast
  • Simple
  • Cheap
  • No translation rules hidden in lexicon
  • Cons
  • Unreliable
  • Not powerful
  • Rule proliferation
  • Requires lots of context
  • Major restructuring after lexical substitution

47
Interlingual MT pros and cons (B. Dorr)
  • Pros
  • Avoids the N2 problem
  • Easier to write rules
  • Cons
  • Semantics is HARD
  • Useful information lost (paraphrase)

48
The impossibility of translation
  • Hebrew adonoi roi for a culture without sheep
    or shepherds
  • Something fluent and understandable, but not
    faithful
  • The Lord will look after me
  • Something faithful, but not fluent and nautral
  • The Lord is for me like somebody who looks after
    animals with cotton-like hair

49
What makes a good translation
  • Translators often talk about two factors we want
    to maximize
  • Faithfulness or fidelity
  • How close is the meaning of the translation to
    the meaning of the original
  • (Even better does the translation cause the
    reader to draw the same inferences as the
    original would have)
  • Fluency or naturalness
  • How natural the translation is, just considering
    its fluency in the target language

50
Statistical MT Systems
Spanish/English Bilingual Text
English Text
Statistical Analysis
Statistical Analysis
Broken English
Spanish
English
What hunger have I, Hungry I am so, I am so
hungry, Have I that hunger
Que hambre tengo yo
I am so hungry
Slide from Kevin Knight
51
Statistical MT Systems
Spanish/English Bilingual Text
English Text
Statistical Analysis
Statistical Analysis
Broken English
Spanish
English
Translation Model P(se)
Language Model P(e)
Que hambre tengo yo
I am so hungry
Decoding algorithm argmax P(e) P(se) e
Slide from Kevin Knight
52
Statistical MT Faithfulness and Fluency
formalized!
  • Best-translation of a source sentence S
  • Developed by researchers who were originally in
    speech recognition at IBM
  • Called the IBM model

53
Three Problems for Statistical MT
  • Language model
  • Given an English string e, assigns P(e) by
    formula
  • good English string -gt high P(e)
  • random word sequence -gt low P(e)
  • Translation model
  • Given a pair of strings ltf,egt, assigns P(f e)
    by formula
  • ltf,egt look like translations -gt high P(f e)
  • ltf,egt dont look like translations -gt low P(f
    e)
  • Decoding algorithm
  • Given a language model, a translation model, and
    a new sentence f find translation e maximizing
    P(e) P(f e)

Slide from Kevin Knight
54
The IBM model
  • Hmm, those two factors might look familiar
  • Yup, its Bayes rule

55
More formally
  • Assume we are translating from a foreign language
    sentence F to an English sentence E
  • F f1, f2, f3,, fm
  • We want to find the best English sentence
  • E-hat e1, e2, e3,, en
  • E-hat argmaxE P(EF)
  • argmaxE P(FE)P(E)/P(F)
  • argmaxE P(FE)P(E)

Translation Model
Language Model
56
The noisy channel model for MT
57
Fluency P(T)
  • How to measure that this sentence
  • That car was almost crash onto me
  • is less fluent than this one
  • That car almost hit me.
  • Answer language models (N-grams!)
  • For example P(hitalmost) gt P(wasalmost)
  • But can use any other more sophisticated model of
    grammar
  • Advantage this is monolingual knowledge!

58
Faithfulness P(ST)
  • French ça me plait that me pleases
  • English
  • that pleases me - most fluent
  • I like it
  • Ill take that one
  • How to quantify this?
  • Intuition degree to which words in one sentence
    are plausible translations of words in other
    sentence
  • Product of probabilities that each word in target
    sentence would generate each word in source
    sentence.

59
Faithfulness P(ST)
  • Need to know, for every target language word,
    probability of it mapping to every source
    language word.
  • How do we learn these probabilities?
  • Parallel texts!
  • Lots of times we have two texts that are
    translations of each other
  • If we knew which word in Source Text mapped to
    each word in Target Text, we could just count!

60
Faithfulness P(ST)
  • Sentence alignment
  • Figuring out which source language sentence maps
    to which target language sentence
  • Word alignment
  • Figuring out which source language word maps to
    which target language word

61
Big Point about Faithfulness and Fluency
  • Job of the faithfulness model P(ST) is just to
    model bag of words which words come from say
    English to Spanish.
  • P(ST) doesnt have to worry about internal facts
    about Spanish word order thats the job of P(T)
  • P(T) can do Bag generation put the following
    words in order (from Kevin Knight)
  • have programming a seen never I language better

-actual the hashing is since not collision-free
usually the is less perfectly the of somewhat
capacity table
62
P(T) and bag generation the answer
  • Usually the actual capacity of the table is
    somewhat less, since the hashing is not
    collision-free
  • How about
  • loves Mary John

63
Three Problems for Statistical MT
  • Language model
  • Given an English string e, assigns P(e) by
    formula
  • good English string -gt high P(e)
  • random word sequence -gt low P(e)
  • Translation model
  • Given a pair of strings ltf,egt, assigns P(f e)
    by formula
  • ltf,egt look like translations -gt high P(f e)
  • ltf,egt dont look like translations -gt low P(f
    e)
  • Decoding algorithm
  • Given a language model, a translation model, and
    a new sentence f find translation e maximizing
    P(e) P(f e)

Slide from Kevin Knight
64
The Classic Language ModelWord N-Grams
  • Goal of the language model -- choose among
  • He is on the soccer field
  • He is in the soccer field
  • Is table the on cup the
  • The cup is on the table
  • Rice shrine
  • American shrine
  • Rice company
  • American company

Slide from Kevin Knight
65
Intuition of phrase-based translation (Koehn et
al. 2003)
  • Generative story has three steps
  • Group words into phrases
  • Translate each phrase
  • Move the phrases around

66
Generative story again
  • Group English source words into phrases e1, e2,
    , en
  • Translate each English phrase ei into a Spanish
    phrase fj.
  • The probability of doing this is ?(fjei)
  • Then (optionally) reorder each Spanish phrase
  • We do this with a distortion probability
  • A measure of distance between positions of a
    corresponding phrase in the 2 lgs.
  • What is the probability that a phrase in
    position X in the English sentences moves to
    position Y in the Spanish sentence?

67
Distortion probability
  • The distortion probability is parameterized by
  • ai-bi-1
  • Where ai is the start position of the foreign
    (Spanish) phrase generated by the ith English
    phrase ei.
  • And bi-1 is the end position of the foreign
    (Spanish) phrase generated by the I-1th English
    phrase ei-1.
  • Well call the distortion probability d(ai-bi-1).
  • And well have a really stupid model
  • d(ai-bi-1) ?ai-bi-1
  • Where ? is some small constant.

68
Final translation model for phrase-based MT
  • Lets look at a simple example with no distortion

69
Phrase-based MT
  • Language model P(E)
  • Translation model P(FE)
  • Model
  • How to train the model
  • Decoder finding the sentence E that is most
    probable

70
Training P(FE)
  • What we mainly need to train is ?(fjei)
  • Suppose we had a large bilingual training corpus
  • A bitext
  • In which each English sentence is paired with a
    Spanish sentence
  • And suppose we knew exactly which phrase in
    Spanish was the translation of which phrase in
    the English
  • We call this a phrase alignment
  • If we had this, we could just count-and-divide

71
But we dont have phrase alignments
  • What we have instead are word alignments

72
Getting phrase alignments
  • To get phrase alignments
  • We first get word alignments
  • Then we symmetrize the word alignments into
    phrase alignments

73
How to get Word Alignments
  • Word alignment a mapping between the source
    words and the target words in a set of parallel
    sentences.
  • Restriction each foreign word comes from exactly
    1 English word
  • Advantage represent an alignment by the index of
    the English word that the French word comes from
  • Alignment above is thus 2,3,4,5,6,6,6

74
One addition spurious words
  • A word in the foreign sentence
  • That doesnt align with any word in the English
    sentence
  • Is called a spurious word.
  • We model these by pretending they are generated
    by an English word e0

75
More sophisticated models of alignment
76
Computing word alignments IBM Model 1
  • For phrase-based machine translation
  • We want a word-alignment
  • To extract a set of phrases
  • A word alignment algorithm gives us P(F,E)
  • We want this to train our phrase probabilities
    ?(fjei) as part of P(FE)
  • But a word-alignment algorithm can also be part
    of a mini-translation model itself.

77
IBM Model 1
78
IBM Model 1
79
How does the generative story assign P(FE) for a
Spanish sentence F?
  • Terminology
  • Suppose we had done steps 1 and 2, I.e. we
    already knew the Spanish length J and the
    alignment A (and English source E)

80
Lets formalize steps 1 and 2
  • We want P(AE) of an alignment A (of length J)
    given an English sentence E
  • IBM Model 1 makes the (very) simplifying
    assumption that each alignment is equally likely.
  • How many possible alignments are there between
    English sentence of length I and Spanish sentence
    of length J?
  • Hint Each Spanish word must come from one of the
    English source words (or the NULL word)
  • (I1)J
  • Lets assume probability of choosing length J is
    small constant epsilon

81
Model 1 continued
  • Prob of choosing a length and then one of the
    possible alignments
  • Combining with step 3
  • The total probability of a given foreign sentence
    F

82
Decoding
  • How do we find the best A?

83
Training alignment probabilities
  • Step 1 get a parallel corpus
  • Hansards
  • Canadian parliamentary proceedings, in French and
    English
  • Hong Kong Hansards English and Chinese
  • Step 2 sentence alignment
  • Step 3 use EM (Expectation Maximization) to
    train word alignments

84
Step 1 Parallel corpora
  • Example from DE-News (8/1/1996)

English German
Diverging opinions about planned tax reform Unterschiedliche Meinungen zur geplanten Steuerreform
The discussion around the envisaged major tax reform continues . Die Diskussion um die vorgesehene grosse Steuerreform dauert an .
The FDP economics expert , Graf Lambsdorff , today came out in favor of advancing the enactment of significant parts of the overhaul , currently planned for 1999 . Der FDP - Wirtschaftsexperte Graf Lambsdorff sprach sich heute dafuer aus , wesentliche Teile der fuer 1999 geplanten Reform vorzuziehen .
Slide from Christof Monz
85
Step 2 Sentence Alignment
  • The old man is happy. He has fished many times.
    His wife talks to him. The fish are jumping.
    The sharks await.
  • Intuition
  • - use length in words or chars
  • - together with dynamic programming
  • - or use a simpler MT model

El viejo está feliz porque ha pescado muchos
veces. Su mujer habla con él. Los tiburones
esperan.
Slide from Kevin Knight
86
Sentence Alignment
  1. The old man is happy.
  2. He has fished many times.
  3. His wife talks to him.
  4. The fish are jumping.
  5. The sharks await.

El viejo está feliz porque ha pescado muchos
veces. Su mujer habla con él. Los tiburones
esperan.
Slide from Kevin Knight
87
Sentence Alignment
  1. The old man is happy.
  2. He has fished many times.
  3. His wife talks to him.
  4. The fish are jumping.
  5. The sharks await.

El viejo está feliz porque ha pescado muchos
veces. Su mujer habla con él. Los tiburones
esperan.
Slide from Kevin Knight
88
Sentence Alignment
  1. The old man is happy. He has fished many times.
  2. His wife talks to him.
  3. The sharks await.

El viejo está feliz porque ha pescado muchos
veces. Su mujer habla con él. Los tiburones
esperan.
Note that unaligned sentences are thrown out,
and sentences are merged in n-to-m alignments (n,
m gt 0).
Slide from Kevin Knight
89
Step 3 word alignments
  • It turns out we can bootstrap alignments
  • From a sentence-aligned bilingual corpus
  • We use is the Expectation-Maximization or EM
    algorithm

90
EM for training alignment probs
la maison la maison bleue la fleur
the house the blue house the flower
All word alignments equally likely All
P(french-word english-word) equally likely
Slide from Kevin Knight
91
EM for training alignment probs
la maison la maison bleue la fleur
the house the blue house the flower
la and the observed to co-occur
frequently, so P(la the) is increased.
Slide from Kevin Knight
92
EM for training alignment probs
la maison la maison bleue la fleur
the house the blue house the flower
house co-occurs with both la and maison,
but P(maison house) can be raised without
limit, to 1.0, while P(la house) is limited
because of the (pigeonhole principle)
Slide from Kevin Knight
93
EM for training alignment probs
la maison la maison bleue la fleur
the house the blue house the flower
settling down after another iteration
Slide from Kevin Knight
94
EM for training alignment probs
la maison la maison bleue la fleur
the house the blue house the flower
  • Inherent hidden structure revealed by EM
    training!
  • For details, see
  • Section 24.6.1 in the chapter
  • A Statistical MT Tutorial Workbook (Knight,
    1999).
  • The Mathematics of Statistical Machine
    Translation (Brown et al, 1993)
  • Software GIZA

Slide from Kevin Knight
95
Statistical Machine Translation
la maison la maison bleue la fleur
the house the blue house the flower
P(juste fair) 0.411 P(juste correct)
0.027 P(juste right) 0.020
Possible English translations, to be rescored by
language model
new French sentence
Slide from Kevin Knight
96
A more complex model IBM Model 3Brown et al.,
1993
Generative approach
Mary did not slap the green witch
n(3slap)
Mary not slap slap slap the green witch
P-Null
Mary not slap slap slap NULL the green witch
t(lathe)
Maria no dió una bofetada a la verde bruja
d(ji)
Maria no dió una bofetada a la bruja verde
Probabilities can be learned from raw bilingual
text.
97
How do we evaluate MT? Human tests for fluency
  • Rating tests Give the raters a scale (1 to 5)
    and ask them to rate
  • Or distinct scales for
  • Clarity, Naturalness, Style
  • Or check for specific problems
  • Cohesion (Lexical chains, anaphora, ellipsis)
  • Hand-checking for cohesion.
  • Well-formedness
  • 5-point scale of syntactic correctness
  • Comprehensibility tests
  • Noise test
  • Multiple choice questionnaire
  • Readability tests
  • cloze

98
How do we evaluate MT? Human tests for fidelity
  • Adequacy
  • Does it convey the information in the original?
  • Ask raters to rate on a scale
  • Bilingual raters give them source and target
    sentence, ask how much information is preserved
  • Monolingual raters give them target a good
    human translation
  • Informativeness
  • Task based is there enough info to do some task?
  • Give raters multiple-choice questions about
    content

99
Evaluating MT Problems
  • Asking humans to judge sentences on a 5-point
    scale for 10 factors takes time and (weeks or
    months!)
  • We cant build language engineering systems if we
    can only evaluate them once every quarter!!!!
  • We need a metric that we can run every time we
    change our algorithm.
  • It would be OK if it wasnt perfect, but just
    tended to correlate with the expensive human
    metrics, which we could still run in quarterly.

Bonnie Dorr
100
Automatic evaluation
  • Miller and Beebe-Center (1958)
  • Assume we have one or more human translations of
    the source passage
  • Compare the automatic translation to these human
    translations
  • Bleu
  • NIST
  • Meteor
  • Precision/Recall

101
BiLingual Evaluation Understudy (BLEU Papineni,
2001)
http//www.research.ibm.com/people/k/kishore/RC221
76.pdf
  • Automatic Technique, but .
  • Requires the pre-existence of Human (Reference)
    Translations
  • Approach
  • Produce corpus of high-quality human translations
  • Judge closeness numerically (word-error rate)
  • Compare n-gram matches between candidate
    translation and 1 or more reference translations

Slide from Bonnie Dorr
102
BLEU Evaluation Metric (Papineni et al, ACL-2002)
Reference (human) translation The U.S. island
of Guam is maintaining a high state of alert
after the Guam airport and its offices both
received an e-mail from someone calling himself
the Saudi Arabian Osama bin Laden and threatening
a biological/chemical attack against public
places such as the airport .
  • N-gram precision (score is between 0 1)
  • What percentage of machine n-grams can be found
    in the reference translation?
  • An n-gram is an sequence of n words
  • Not allowed to use same portion of reference
    translation twice (cant cheat by typing out the
    the the the the)
  • Brevity penalty
  • Cant just type out single word the (precision
    1.0!)
  • Amazingly hard to game the system (i.e.,
    find a way to change machine output so that BLEU
    goes up, but quality doesnt)

Machine translation The American ?
international airport and its the office all
receives one calls self the sand Arab rich
business ? and so on electronic mail , which
sends out The threat will be able after public
place and so on the airport to start the
biochemistry attack , ? highly alerts after the
maintenance.
Slide from Bonnie Dorr
103
BLEU Evaluation Metric (Papineni et al, ACL-2002)
Reference (human) translation The U.S. island
of Guam is maintaining a high state of alert
after the Guam airport and its offices both
received an e-mail from someone calling himself
the Saudi Arabian Osama bin Laden and threatening
a biological/chemical attack against public
places such as the airport .
  • BLEU4 formula
  • (counts n-grams up to length 4)
  • exp (1.0 log p1
  • 0.5 log p2
  • 0.25 log p3
  • 0.125 log p4
  • max(words-in-reference / words-in-machine
    1,
  • 0)
  • p1 1-gram precision
  • P2 2-gram precision
  • P3 3-gram precision
  • P4 4-gram precision

Machine translation The American ?
international airport and its the office all
receives one calls self the sand Arab rich
business ? and so on electronic mail , which
sends out The threat will be able after public
place and so on the airport to start the
biochemistry attack , ? highly alerts after the
maintenance.
Slide from Bonnie Dorr
104
Multiple Reference Translations
Slide from Bonnie Dorr
105
BLEU in Action
???????? (Foreign Original) the gunman was
shot to death by the police . (Reference
Translation) the gunman was police kill .
1wounded police jaya of 2the gunman
was shot dead by the police . 3the gunman
arrested by police kill . 4the gunmen were
killed . 5the gunman was shot to death by
the police . 6 gunmen were killed by police
?SUBgt0 ?SUBgt0 7 al by the police . 8the
ringer is killed by the police . 9police
killed the gunman . 10
Slide from Bonnie Dorr
106
BLEU in Action
???????? (Foreign Original) the gunman was
shot to death by the police . (Reference
Translation) the gunman was police kill .
1wounded police jaya of 2the gunman
was shot dead by the police . 3the gunman
arrested by police kill . 4the gunmen were
killed . 5the gunman was shot to death by
the police . 6 gunmen were killed by police
?SUBgt0 ?SUBgt0 7 al by the police . 8the
ringer is killed by the police . 9police
killed the gunman . 10
green 4-gram match (good!) red word not
matched (bad!)
Slide from Bonnie Dorr
107
Bleu Comparison
Chinese-English Translation Example Candidate 1
It is a guide to action which ensures that the
military always obeys the commands of the
party. Candidate 2 It is to insure the troops
forever hearing the activity guidebook that party
direct.
Reference 1 It is a guide to action that ensures
that the military will forever heed Party
commands. Reference 2 It is the guiding
principle which guarantees the military forces
always being under the command of the
Party. Reference 3 It is the practical guide for
the army always to heed the directions of the
party.
Slide from Bonnie Dorr
108
How Do We Compute Bleu Scores?
  • Intuition What percentage of words in candidate
    occurred in some human translation?
  • Proposal count up of candidate translation
    words (unigrams) in any reference translation,
    divide by the total of words in candidate
    translation
  • But cant just count total of overlapping
    N-grams!
  • Candidate the the the the the the
  • Reference 1 The cat is on the mat
  • Solution A reference word should be considered
    exhausted after a matching candidate word is
    identified.

Slide from Bonnie Dorr
109
Modified n-gram precision
  • For each word compute
  • (1) total number of times it occurs in any
    single reference translation
  • (2) number of times it occurs in the candidate
    translation
  • Instead of using count 2, use the minimum of 2
    and 2, I.e. clip the counts at the max for the
    reference transcription
  • Now use that modified count.
  • And divide by number of candidate words.

Slide from Bonnie Dorr
110
Modified Unigram Precision Candidate 1
It(1) is(1) a(1) guide(1) to(1) action(1)
which(1) ensures(1) that(2) the(4) military(1)
always(1) obeys(0) the commands(1) of(1) the
party(1)
Reference 1 It is a guide to action that ensures
that the military will forever heed Party
commands. Reference 2 It is the guiding
principle which guarantees the military forces
always being under the command of the
Party. Reference 3 It is the practical guide for
the army always to heed the directions of the
party.
Whats the answer???
17/18
Slide from Bonnie Dorr
111
Modified Unigram Precision Candidate 2
It(1) is(1) to(1) insure(0) the(4) troops(0)
forever(1) hearing(0) the activity(0)
guidebook(0) that(2) party(1) direct(0)
Reference 1 It is a guide to action that ensures
that the military will forever heed Party
commands. Reference 2 It is the guiding
principle which guarantees the military forces
always being under the command of the
Party. Reference 3 It is the practical guide for
the army always to heed the directions of the
party.
Whats the answer????
8/14
Slide from Bonnie Dorr
112
Modified Bigram Precision Candidate 1
It is(1) is a(1) a guide(1) guide to(1) to
action(1) action which(0) which ensures(0)
ensures that(1) that the(1) the military(1)
military always(0) always obeys(0) obeys the(0)
the commands(0) commands of(0) of the(1) the
party(1)
Reference 1 It is a guide to action that ensures
that the military will forever heed Party
commands. Reference 2 It is the guiding
principle which guarantees the military forces
always being under the command of the
Party. Reference 3 It is the practical guide for
the army always to heed the directions of the
party.
10/17
Whats the answer????
Slide from Bonnie Dorr
113
Modified Bigram Precision Candidate 2
It is(1) is to(0) to insure(0) insure the(0) the
troops(0) troops forever(0) forever hearing(0)
hearing the(0) the activity(0) activity
guidebook(0) guidebook that(0) that party(0)
party direct(0)
Reference 1 It is a guide to action that ensures
that themilitary will forever heed Party
commands. Reference 2 It is the guiding
principle which guarantees the military forces
always being under the command of the
Party. Reference 3 It is the practical guide for
the army always to heed the directions of the
party.
Whats the answer????
1/13
Slide from Bonnie Dorr
114
Catching Cheaters
the(2) the the the(0) the(0) the(0) the(0)
Reference 1 The cat is on the mat Reference 2
There is a cat on the mat
Whats the unigram answer?
2/7
Whats the bigram answer?
0/7
Slide from Bonnie Dorr
115
Bleu distinguishes human from machine translations
Slide from Bonnie Dorr
116
Bleu problems with sentence length
  • Candidate of the
  • Solution brevity penalty prefers candidates
    translations which are same length as one of the
    references

Reference 1 It is a guide to action that ensures
that themilitary will forever heed Party
commands. Reference 2 It is the guiding
principle which guarantees the military forces
always being under the command of the
Party. Reference 3 It is the practical guide for
the army always to heed the directions of the
party.
Problem modified unigram precision is 2/2,
bigram 1/1!
Slide from Bonnie Dorr
117
BLEU Tends to Predict Human Judgments
(variant of BLEU)
slide from G. Doddington (NIST)
118
Summary
  • Intro and a little history
  • Language Similarities and Divergences
  • Four main MT Approaches
  • Transfer
  • Interlingua
  • Direct
  • Statistical
  • Evaluation

119
Classes
  • LINGUIST 139M/239M. Human and Machine
    Translation. (Martin Kay)
  • CS 224N. Natural Language Processing (Chris
    Manning)
Write a Comment
User Comments (0)
About PowerShow.com