LING 138238 SYMBSYS 138 Intro to Computer Speech and Language Processing

About This Presentation
Title:

LING 138238 SYMBSYS 138 Intro to Computer Speech and Language Processing

Description:

... by Spanish words one-for-one, because translations aren't the ... (3|NULL) (probability of being exactly 3 spurious words in a Spanish translation) ... –

Number of Views:68
Avg rating:3.0/5.0
Slides: 41
Provided by: DanJur6
Learn more at: https://web.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: LING 138238 SYMBSYS 138 Intro to Computer Speech and Language Processing


1
LING 138/238 SYMBSYS 138Intro to Computer Speech
and Language Processing
  • Lecture 12 Machine Translation (II)
  • November 4, 2004
  • Dan Jurafsky

Thanks to Kevin Knight for much of this material!!
2
Outline for MT Week
  • Intro and a little history
  • Language Similarities and Divergences
  • Four main MT Approaches
  • Transfer
  • Interlingua
  • Direct
  • Statistical
  • Evaluation

3
Thanks to Bonnie Dorr!
  • Next ten slides draw from her slides on BLEU

4
How do we evaluate MT? Human
  • Fluency
  • Overall fluency
  • Human rating of sentences read out loud
  • Cohesion (Lexical chains, anaphora, ellipsis)
  • Hand-checking for cohesion.
  • Well-formedness
  • 5-point scale of syntactic correctness
  • Fidelity (same information as source?)
  • Hand rating of target text on 100pt scale
  • Clarity
  • Comprehensibility
  • Noise test
  • Multiple choice questionnaire
  • Readability
  • cloze

5
Evaluating MT Problems
  • Asking humans to judge sentences on a 5-point
    scale for 10 factors takes time and (weeks or
    months!)
  • We cant build language engineering systems if we
    can only evaluate them once every quarter!!!!
  • We need a metric that we can run every time we
    change our algorithm.
  • It would be OK if it wasnt perfect, but just
    tended to correlate with the expensive human
    metrics, which we could still run in quarterly.

6
BiLingual Evaluation Understudy (BLEU Papineni,
2001)
http//www.research.ibm.com/people/k/kishore/RC221
76.pdf
  • Automatic Technique, but .
  • Requires the pre-existence of Human (Reference)
    Translations
  • Approach
  • Produce corpus of high-quality human translations
  • Judge closeness numerically (word-error rate)
  • Compare n-gram matches between candidate
    translation and 1 or more reference translations

7
Bleu Comparison
Chinese-English Translation Example Candidate 1
It is a guide to action which ensures that the
military always obeys the commands of the
party. Candidate 2 It is to insure the troops
forever hearing the activity guidebook that party
direct.
Reference 1 It is a guide to action that ensures
that the military will forever heed Party
commands. Reference 2 It is the guiding
principle which guarantees the military forces
always being under the command of the
Party. Reference 3 It is the practical guide for
the army always to heed the directions of the
party.
8
How Do We Compute Bleu Scores?
  • Intuition What percentage of words in candidate
    occurred in some human translation?
  • Proposal count up of candidate translation
    words (unigrams) in any reference translation,
    divide by the total of words in candidate
    translation
  • But cant just count total of overlapping
    N-grams!
  • Candidate the the the the the the
  • Reference 1 The cat is on the mat
  • Solution A reference word should be considered
    exhausted after a matching candidate word is
    identified.

9
Modified n-gram precision
  • For each word compute
  • (1) total number of times it occurs in any
    single reference translation
  • (2) number of times it occurs in the candidate
    translation
  • Instead of using count 2, use the minimum of 2
    and 2, I.e. clip the counts at the max for the
    reference transcription
  • Now use that modified count.
  • And divide by number of candidate words.

10
Modified Unigram Precision Candidate 1
It(1) is(1) a(1) guide(1) to(1) action(1)
which(1) ensures(1) that(2) the(4) military(1)
always(1) obeys(0) the commands(1) of(1) the
party(1)
Reference 1 It is a guide to action that ensures
that the military will forever heed Party
commands. Reference 2 It is the guiding
principle which guarantees the military forces
always being under the command of the
Party. Reference 3 It is the practical guide for
the army always to heed the directions of the
party.
Whats the answer???
17/18
11
Modified Unigram Precision Candidate 2
It(1) is(1) to(1) insure(0) the(4) troops(0)
forever(1) hearing(0) the activity(0)
guidebook(0) that(2) party(1) direct(0)
Reference 1 It is a guide to action that ensures
that the military will forever heed Party
commands. Reference 2 It is the guiding
principle which guarantees the military forces
always being under the command of the
Party. Reference 3 It is the practical guide for
the army always to heed the directions of the
party.
Whats the answer????
8/14
12
Modified Bigram Precision Candidate 1
It is(1) is a(1) a guide(1) guide to(1) to
action(1) action which(0) which ensures(0)
ensures that(1) that the(1) the military(1)
military always(0) always obeys(0) obeys the(0)
the commands(0) commands of(0) of the(1) the
party(1)
Reference 1 It is a guide to action that ensures
that the military will forever heed Party
commands. Reference 2 It is the guiding
principle which guarantees the military forces
always being under the command of the
Party. Reference 3 It is the practical guide for
the army always to heed the directions of the
party.
10/17
Whats the answer????
13
Modified Bigram Precision Candidate 2
It is(1) is to(0) to insure(0) insure the(0) the
troops(0) troops forever(0) forever hearing(0)
hearing the(0) the activity(0) activity
guidebook(0) guidebook that(0) that party(0)
party direct(0)
Reference 1 It is a guide to action that ensures
that themilitary will forever heed Party
commands. Reference 2 It is the guiding
principle which guarantees the military forces
always being under the command of the
Party. Reference 3 It is the practical guide for
the army always to heed the directions of the
party.
Whats the answer????
1/13
14
Catching Cheaters
the(2) the the the(0) the(0) the(0) the(0)
Reference 1 The cat is on the mat Reference 2
There is a cat on the mat
Whats the unigram answer?
2/7
Whats the bigram answer?
0/7
15
Bleu distinguishes human from machine translations
16
Bleu problems with sentence length
  • Candidate of the
  • Solution brevity penalty prefers candidates
    translations which are same length as one of the
    references

Reference 1 It is a guide to action that ensures
that themilitary will forever heed Party
commands. Reference 2 It is the guiding
principle which guarantees the military forces
always being under the command of the
Party. Reference 3 It is the practical guide for
the army always to heed the directions of the
party.
Problem modified unigram precision is 2/2,
bigram 1/1!
17
Statistical MT
  • Fidelity and fluency
  • Best-translation
  • Developed by researchers who were originally in
    speech recognition at IBM
  • Called the IBM model

18
The IBM model
  • Hmm, those two factors might look familiar
  • Yup, its Bayes rule

19
Fluency P(T)
  • How to measure that this sentence
  • That car was almost crash onto me
  • is less fluent than this one
  • That car almost hit me.
  • Answer language models (N-grams!)
  • For example P(hitalmost) P(wasalmost)
  • But can use any other more sophisticated model of
    grammar
  • Advantage this is monolingual knowledge!

20
Faithfulness P(ST)
  • French ça me plait that me pleases
  • English
  • that pleases me - most fluent
  • I like it
  • Ill take that one
  • How to quantify this?
  • Intuition degree to which words in one sentence
    are plausible translations of words in other
    sentence
  • Product of probabilities that each word in target
    sentence would generate each word in source
    sentence.

21
Faithfulness P(ST)
  • Need to know, for every target language word,
    probability of it mapping to every source
    language word.
  • How do we learn these probabilities?
  • Parallel texts!
  • Lots of times we have two texts that are
    translations of each other
  • If we knew which word in Source Text mapped to
    each word in Target Text, we could just count!

22
Faithfulness P(ST)
  • Sentence alignment
  • Figuring out which source language sentence maps
    to which target language sentence
  • Word alignment
  • Figuring out which source language word maps to
    which target language word

23
Big Point about Faithfulness and Fluency
  • Job of the faithfulness model P(ST) is just to
    model bag of words which words come from say
    English to Spanish.
  • P(ST) doesnt have to worry about internal facts
    about Spanish word order thats the job of P(T)
  • P(T) can do Bag generation put the following
    words in order
  • Have programming a seen never I language better
  • Actual the hashing is since not collision-free
    usually the is less perfectly the of somewhat
    capacity table

24
P(T) and bag generation the answer
  • Usually the actual capacity of the table is
    somewhat less, since the hashing is not
    collision-free

25
A motivating example
  • Japanese phrase 2000nen taio
  • 2000nen
  • 2000 - highest
  • Y2K
  • 2000 years
  • 2000 year
  • Taio
  • Correspondence -highest
  • Corresponding
  • Equivalent
  • Tackle
  • Dealing with
  • Deal with

P(ST) alone prefers 2000 Correspondence
Adding P(T) might produce correct Dealing with
Y2K
26
More formally The IBM Model
  • Lets flesh out these intuitions about P(ST) and
    P(T) a bit.
  • Many of the next slides are drawn from Kevin
    Knights fantastic A Statistical MT Tutorial
    Workbook!

27
IBM Model 3 as probabilistic version of Direct MT
  • We translate English into Spanish as follows
  • Replace the words in the English sentence by
    Spanish words
  • Scramble around the words to look like Spanish
    order
  • But we cant propose that English words are
    replaced by Spanish words one-for-one, because
    translations arent the same length.

28
IBM Model 3 (from Knight 1999)
  • For each word ei in English sentence, choose a
    fertility ?i. The choice of ?i depends only on
    ei, not other words or ?s.
  • For each word ei, generate ?i Spanish words.
    Choice of French word depends only on English
    word ei, not English context or any Spanish
    words.
  • Permute all the Spanish words. Each Spanish word
    gets assign absolute target position slot (1,2,3,
    etc). Choice of Spanish word position dependent
    only on absolute position of English word
    generating it.

29
Translation as String rewriting (from Knight 1999)
  • Mary did not slap the green witch
  • Assign fertilities 1 copy over word, 2 copy
    twice, etc. 0 delete
  • Mary not slap slap slap the the green witch
  • Replace English words with Spanish one-for-one
  • Mary no daba una botefada a la verde bruja
  • Permute the words
  • Mary no daba una botefada a la bruja verde

30
Model 3 P(ST) training parameters
  • What are the parameters for this model? Just look
    at dependencies
  • Words P(casahouse)
  • Fertilities n(1house) prob that house will
    produce 1 Spanish word whenever house appears.
  • Distortions d(52) prob that English word in
    position 2 of English sentence generates French
    word in position 5 of French translation
  • Actually, distortions are d(5,2,4,6) where 4 is
    length of English sentence, 6 is Spanish length
  • Remember, P(ST) doesnt have to model fluency

31
Model 3 last twist
  • Imagine some Spanish words are spurious they
    appear in Spanish even though they werent in
    English original
  • Like function words we generated a la from
    the by giving the fertility 2
  • Instead, we could give the fertility 1, and
    generat a spuriously
  • Do this by pretending every English sentence
    contains invisible word NULL as word 0.
  • Then parameters like t(aNULL) give probability
    of word a generating spuriously from NULL

32
Spurious words
  • We could imagine having n(3NULL) (probability of
    being exactly 3 spurious words in a Spanish
    translation)
  • Instead, of n(0NULL), n(1NULL) N(25NULL),
    have a single parameter p1
  • After assign fertilities to non-NULL English
    words we want to generate (say) z Spanish words.
  • As we genreate each of z words, we optionally
    toss in spurious Spanish word with probability p1
  • Probability of not tossing in spurious word
    p01-p1

33
Distortion probabilities for spurious words
  • Cant just have d(50,4,6), I.e. chance that NULL
    word will end up in position 5.
  • Why? These are spurious words! Could occur
    anywhere!! To hard to predict
  • Instead,
  • Use normal-word distortion parameters to choose
    positions for normally-generated Spanish words
  • Put Null-generated words into empty slots left
    over
  • If three NULL-generated words, and three empty
    slots, then there are 3!, or six, ways for
    slotting them all in
  • Well assign a probability of 1/6 for each way

34
Real Model 3
  • For each word ei in English sentence, choose
    fertility ?i with prob n(?i ei)
  • Choose number ?0 of spurious Spanish words to be
    generated from e0NULL using p1 and sum of
    fertilities from step 1
  • Let m be sum of fertilities for all words
    including NULL
  • For each i0,1,2,L , k1,2, ?I
  • choose Spanish word ?ikwith probability t(?ikei)
  • For each i1,2,L , k1,2, ?I
  • choose target Spanish position ?ikwith prob
    d(?ikI,L,m)
  • For each k1,2,, ?0 choose position ?0k from ?0
    -k1 remaining vacant positions in 1,2,m for
    total prob of 1/ ?0!
  • Output Spanish sentence with words ?ik in
    positions ?ik (0

35
String rewriting
  • Mary did not slap the green witch (input)
  • Mary not slap slap slap the green witch (choose
    fertilities)
  • Mary not slap slap slap NULL the green witch
    (choose number of spurious words)
  • Mary no daba una botefada a la verde bruja
    (choose translations)
  • Mary no daba una botefada a la bruja verde
    (choose target positions)

36
Model 3 parameters
  • N,t,p,d
  • If we had English strings and step-by-step
    rewritings into Spanish, we could
  • Compute n(0did) by locating every instance of
    did, see what happens to it during first
    rewriting step
  • If did appeared 15,000 times and was deleted
    during the first rewriting step 13,000 times,
    then n(0did) 13/15

37
Alignments
  • NULL And the program has been implemented
  • /\
  • Le programme a ete mis en
    application
  • If we had lots of alignments like this,
  • n(0d) how many times did connects to no
    French words
  • T(maisonhouse) how many of all French words
    generated by house were maison
  • D(52,4,6) out of all times some word2 moved
    somewhere, how many times to word5?

38
Where to get alignments
  • It turns out we can bootstrap alignments
  • If we just have a bilingual corpus
  • We can bootstrap alignments
  • Assume some startup values for n,d,?, etc
  • Use values for n,d, ?, etc to use model 3 to do
    forced alignment I.e. to pick the best word
    alignments between sentences
  • Use these alignments to retrain n,d, ?, etc
  • Go to 2
  • This is called the Expectation-Maximization or EM
    algorithm

39
Summary
  • Intro and a little history
  • Language Similarities and Divergences
  • Four main MT Approaches
  • Transfer
  • Interlingua
  • Direct
  • Statistical
  • Evaluation

40
Classes
  • LINGUIST 139M/239M. Human and Machine
    Translation. (Martin Kay)
  • CSCI 224N. Natural Language Processing (Chris
    Manning)
Write a Comment
User Comments (0)
About PowerShow.com