Advanced Techniques in NLP - PowerPoint PPT Presentation

About This Presentation
Title:

Advanced Techniques in NLP

Description:

Do we want a translation system for one language pair or for many language pairs? ... the best method in statistical machine translation. Discriminative training ... – PowerPoint PPT presentation

Number of Views:197
Avg rating:3.0/5.0
Slides: 57
Provided by: MikeR2
Category:

less

Transcript and Presenter's Notes

Title: Advanced Techniques in NLP


1
Advanced Techniques in NLP
  • Machine Translation III
  • Statistical MT

2
Approaching MT
  • There are many different ways of approaching the
    problem of MT.
  • The choice of approach is complex and depends
    upon
  • Task requirements
  • Human resources
  • Linguistic resources

3
Criterial Issues
  • Do we want a translation system for one language
    pair or for many language pairs?
  • Can we assume a constrained vocabulary or do we
    need to deal with arbitrary text?
  • What resources exist for the languages that we
    are dealing with?
  • How long will it take us to develop the resources
    and what human resources?

4
Parallel Data
  • Lots of translated text available 100s of
    million words of translated text for some
    language pairs
  • a book has a few 100,000s words
  • an educated person may read 10,000 words a day
  • 3.5 million words a year
  • 300 million a lifetime
  • Computers can see more translated text than
    humans read in a lifetime
  • Machine can learn how to translate foreign
    languages.
  • Koehn 2006

5
Statistical Translation
  • Robust
  • Domain independent
  • Extensible
  • Does not require language specialists
  • Does requires parallel texts
  • Uses noisy channel model of translation

6
Noisy Channel ModelSentence Translation (Brown
et. al. 1990)
target sentence
sourcesentence
sentence
7
Statistical Modelling
  • Learn P(fe) from a parallel corpus
  • Not sufficient data to estimate P(fe) directly
  • from Koehn 2006

8
The Problem of Translation
  • Given a sentence T of the target language, seek
    the source sentence S from which a translator
    produced T, i.e.
  • find S that maximises P(ST)
  • By Bayes' theorem
  • P(ST) P(S) x P(TS)
  • P(T)
  • whose denominator is independent of S.
  • Hence it suffices to maximise P(S) x P(TS)

9
The Three Components of a Statistical MT model
  1. Method for computing language model probabilities
    (P(S))
  2. Method for computing translation probabilities
    (P(ST))
  3. Method for searching amongst source sentences for
    one that maximisesP(S) P(TS)

10
A Statistical MT System
S
T
Source Language Model
Translation Model
P(S) P(TS)
P(ST)
T
S
Decoder
11
Three Kinds of Model
12
Language Models based on N-Grams of Words
  • GeneralP(s1s2...sn) P(s1)P(s2s1)
    ...P(sns1...s(n-1))
  • TrigramP(s1s2...sn) P(s1)P(s2s1)P(s3s1,s2)
    ...P(sns(n-1)s(n-2))
  • BigramP(s1s2...sn) P(s1)P(s2s1)
    ...P(sns(n-1))

13
Syntax Based Language Models
  • Good syntax tree good English
  • Allows for long distance contstraints
  • Left sentence preferred by syntax based model

14
Word-Based Translation Models
  • Translation process is decomposed into smaller
    steps
  • Each is tied to words
  • Based on IBM Models Brown et al., 1993
  • from Koehn 2006

15
Word TM derived from Bitext
  • ENGLISH
  • the cat sleeps
  • the dog sleeps
  • the horse eats
  • FRENCH
  • le chat dort
  • le chien dort
  • le cheval mange

16
le chat dort/the cat sleeps
le I I I
chat I I I
chien
cheval
dort I I I
mange
the cat dog horse sleeps eats
17
le chien dort/the dog sleeps
le II I I II
chat I I I
chien I I I
cheval
dort II I I II
mange
the cat dog horse sleeps eats
18
le cheval mange/the horse eats
P(ts)
le III I I I II I
chat I I I
chien I I I
cheval I I I
dort II I I II
mange I I I
the cat dog horse sleeps eats
3/9
1/9
1/9
1/9
2/9
1/9
19
Parameter Estimation
  • Based on counting occurrences within monolingual
    and bilingual data.
  • For language model, we need only source language
    text.
  • For translation model, we need pairs of sentences
    that are translations of each other.
  • Use EM (Expectation Maximisation) Algorithm (Baum
    1972) to optimize model parameters.

20
EM Algorithm
  • Word Alignmentsfor sentence pair ("a b c", "x y
    z")are formed from arbitrary pairings from the
    two sentences and include(a.x,b.y,c.z),
    (a.z,b.y,c.x), etc.
  • There is a large number of possible alignments,
    since we also allow, e.g.(ab.x,0.y,c.z),

21
EM Algorithm
  • Make initial estimate of parameters. This can be
    used to compute the probability of any possible
    word alignment.
  • Re-estimate parameters by ranking each possible
    alignment by its probability according to initial
    guess.
  • Repeated iterations assign ever greater
    probability to the set of sentences actually
    observed.
  • Algorithm leads to a local maximum of the
    probability of observed sentence pairs as a
    function of the model parameters

22
Parameters forIBM Translation Model
  • Word Translation Probability, P(ts)probability
    that source word s is translated as target word
    t.
  • Fertility P(ns)probability that source word s
    is translated by n target words (25 n0).
  • Distortion P(ij,l)probability that source word
    at position j is translated by target word at
    position i in target sentence of length l.

23
Experiment 1 (Brown et. al. 1990)
  • Hansard. 40,000 pairs of sentences approx.
    800,000 words in each language.
  • Considered 9,000 most common words in each
    language.
  • Assumptions (initial parameter values)
  • each of the 9000 target words equally likely as
    translations of each of the source words.
  • each of the fertilities from 0 to 25 equally
    likely for each of the 9000 source words
  • each target position equally likely given each
    source position and target length

24
English the
  • French Probability
  • le .610
  • la .178
  • l .083
  • les .023
  • ce .013
  • il .012
  • de .009
  • à .007
  • que .007
  • Fertility Probability
  • 1 .871
  • 0 .124
  • 2 .004

25
English not
  • French Probability
  • pas .469
  • ne .460
  • non .024
  • pas du tout .003
  • faux .003
  • plus .002
  • ce .002
  • que .002
  • jamais .002
  • Fertility Probability
  • 2 .758
  • 0 .133
  • 1 .106

26
English hear
  • French Probability
  • bravo .992
  • entendre .005
  • entendu .002
  • entends .001
  • Fertility Probability
  • 0 .584
  • 1 .416

27
Sentence Translation Probability
  • Given translation model for words, we can compute
    translation probability of sentence taking
    parameters into account.
  • P(Jean aime MarieJohn loves Mary)
    P(JeanJohn) P(1,John) P(11,3)
    P(aimeloves) P(1,loves) P(22,3)
    P(MarieMary) P(1,Mary) P(33,3)

28
Flaws in Word-Based Translation
  • Model handles manyone P(ttts) but not onemany
    P(tsss) translations
  • e.g.
  • Zeitmangel erschwert das
    Problem .
  • lack of time makes more difficult the problem
    .
  • Correct translation Lack of time makes the
    problem more difficult.
  • MT output Time makes the problem.
  • from Koehn 2006

29
Flaws Word-Based Translation (2)
  • Phrasal Translation P(tttssss)
  • e.g. erübrigt sich /there is no point in
  • Eine Diskussion erübrigt
    sich demnach .
  • a discussion is made unnecessary itself
    therefore .
  • Correct translation Therefore, there is no point
    in a discussion.
  • MT output A debate turned therefore .
  • from Koehn 2006

30
Flaws in Word BasedTranslation (3)
  • Syntactic transformations
  • Example Object/subject reordering
  • Den Vorschlag lehnt die Kommission abthe
    proposal rejects the commission off
  • Correct translation The commission rejects the
    proposal .
  • MT output The proposal rejects the commission.
  • from Koehn 2006

31
Phrase Based Translation Models
  • Foreign input is segmented in phrases.
  • Phrases are any sequence of words, not
    necessarily linguistically motivated.
  • Each phrase is translated into English
  • Phrases are reordered.
  • from Koehn 2006

32
Syntax Based Translation Models
33
Word Based Decoding searching for the best
translation (Brown 1990)
  • Maintain list of hypotheses.
  • Initial hypothesis (Jean aime Marie )
  • Search proceeds iteratively.
  • At each iteration we extend most promising
    hypotheses with additional wordsJean aime Marie
    John(1) Jean aime Marie loves(2) Jean
    aime Marie Mary(3) Jean aime Marie
    Jean(1)
  • Parenthesised numbers indicate corresponding
    position in target sentence

34
Phrase-Based Decoding
  • Build translation left to right
  • select foreign word(s) to be translated
  • find English phrase translation
  • add English phrase to end of partial translation
  • Koehn 2006

35
Decoding Process
  • one to many translation
  • Koehn 2006

36
Decoding Process
  • many to one translation
  • Koehn 2006

37
Decoding Process
  • translation finished
  • Koehn 2006

38
Hypothesis Expansion
  • Start with empty hypothesis
  • e no English words
  • f no foreign words covered
  • p probability 1
  • Koehn 2006

39
Hypothesis Expansion
40
Hypothesis Expansion
  • further hypothesis expansion
  • Koehn 2006

41
Decoding Process
  • adding more hypotheses leads to explosion of
    search space.
  • Koehn 2006

42
Hypothesis Recombination
  • Sometimes different choices of hypothesis lead to
    the same translation result.
  • Such paths can be combined.
  • Koehn 2006

43
Hypothesis Recombination
  • Drop weaker path
  • Keep pointer from weaker path
  • Koehn 2006

44
Pruning
  • Hypothesis recombination is not sufficient
  • Heuristically discard weak hypotheses early
  • Organize Hypothesis in stacks, e.g. by
  • same foreign words covered
  • same number of foreign words covered (Pharaoh
    does this)
  • same number of English words produced
  • Compare hypotheses in stacks, discard bad ones
  • histogram pruning keep top n hypotheses in each
    stack (e.g., n100)
  • threshold pruning keep hypotheses that are at
    most times the cost of best hypothesis in stack
    (e.g., 0.001)

45
Hypothesis Stacks
  • Organization of hypothesis into stacks
  • here based on number of foreign words translated
  • during translation all hypotheses from one stack
    are expanded
  • expanded Hypotheses are placed into stacks
  • one to many translation
  • Koehn 2006

46
Comparing Hypotheses covering Same Number of
Foreign Words
  • Hypothesis that covers easy part of sentence is
    preferred
  • Need to consider future cost of uncovered parts
  • Should take account of one to many translation
  • Koehn 2006

47
Future Cost Estimation
  • Use future cost estimates when pruning hypotheses
  • For each uncovered contiguous span
  • look up future costs for each maximal contiguous
    uncovered span
  • add to actually accumulated cost for translation
    option for pruning
  • Koehn 2006

48
Pharoah
  • A beam search decoder for phrase-based models
  • works with various phrase-based models
  • beam search algorithm
  • time complexity roughly linear with input length
  • good quality takes about 1 second per sentence
  • Very good performance in DARPA/NIST Evaluation
  • Freely available for researchers
    http//www.isi.edu/licensed-sw/pharaoh/
  • Coming soon open source version of Pharaoh

49
Pharoah Demo
  • echo das ist ein kleines haus pharaoh -f
    pharaoh.ini gt out
  • Pharaoh v1.2.9, written by Philipp Koehn
  • a beam search decoder for phrase-based
    statistical machine translation models
  • (c) 2002-2003 University of Southern California
  • (c) 2004 Massachusetts Institute of Technology
  • (c) 2005 University of Edinburgh, Scotland
  • loading language model from europarl.srilm
  • loading phrase translation table from
    phrase-table, stored 21, pruned 0, kept 21
  • loaded data structures in 2 seconds
  • reading input sentences
  • translating 1 sentences.translated 1 sentences
    in 0 seconds
  • 3mm cat out
  • this is a small house

50
Brown Experiment 2
  • Perform translation using 1000 most frequent
    words in the English corpus.
  • 1,700 most frequently used French words in
    translations of sentences completely covered by
    1000 word English vocabulary.
  • 117,000 pairs of sentences completely covered by
    both vocabularies.
  • Parameters of English language model from 570,000
    sentences in English part.

51
Experiment 2 contd
  • 73 French sentences tested from elsewhere in
    corpus. Results were classified as
  • Exact same as actual translation
  • Alternate same meaning
  • Different legitimate translation but different
    meaning
  • Wrong could not be intepreted as a translation
  • Ungrammatical grammatically deficient
  • Corrections to the last three categories were
    made and keystrokes were counted

52
Results
Category sentences percent
Exact 4 5
Alternate 18 25
Different 13 18
Wrong 11 15
Ungrammatical 27 37

Total 73
53
Results - Discussion
  • According to Brown et. al., system performed
    successfully 48 of the time (first three
    categories).
  • 776 keystrokes needed to repair 1916 keystrokes
    to generate all 73 translations from scratch.
  • According to authors, system therefore reduces
    work by 60.

54
Issues
  • Automatic evaluation methods
  • can computers decide what are good translations?
  • Phrase-based models
  • what are atomic units of translation?
  • how are they discovered?
  • the best method in statistical machine
    translation
  • Discriminative training
  • what are the methods that directly optimize
    translation performance?

55
The Speculative (Koehn 2006)
  • Syntax-based transfer models
  • how can we build models that take advantage of
    syntax?
  • how can we ensure that the output is grammatical?
  • Factored translation models
  • how can we integrate different levels of
    abstraction?

56
Bibliography
  • Statistical MTBrown et. al., A Statistical
    Approach to MT, Computational Linguistics 16.2,
    1990 pp79-85 (search ACL Anthology)
  • Koehn tutorial (see http//www.iccs.inf.ed.ac.uk/
    pkoehn/)
Write a Comment
User Comments (0)
About PowerShow.com