THE MATHEMATICS OF - PowerPoint PPT Presentation

About This Presentation
Title:

THE MATHEMATICS OF

Description:

STATISTICAL MACHINE. TRANSLATION. Sriraman M Tallam. Sriraman Tallam * 2. The Problem. The ... Machine translation's goal is to find that English sentence. ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 28
Provided by: srir9
Category:
Tags: mathematics | the

less

Transcript and Presenter's Notes

Title: THE MATHEMATICS OF


1
THE MATHEMATICS OF STATISTICAL MACHINE
TRANSLATION Sriraman M Tallam
2
The Problem
  • The problem of machine translation is discussed.
  • Five Statistical Models are proposed for the
    translation process.
  • Algorithms for estimating their parameters are
    described.
  • For the learning process, pairs of sentences that
    are translations of one another are used.
  • Previous work shows statistical methods to be
    useful in achieving linguistically interesting
    goals.
  • natural extension - matching up words within
    pairs of aligned sentences.
  • Results show the power of statistical methods in
    extracting linguistically interesting
    correlations.

3
Statistical Translation
  • Warren Weaver first suggested the use of
    statisitical techniques for machine translation.
    Weaver 1955
  • Fundamental Equation for Machine Translation

Pr(ef) Pr(e) Pr(fe)
--------------- Pr(f)

ê argmax Pr(e) Pr(fe)
4
Statistical Translation
  • A translator when writing a French sentence, even
    a native speaker, conceives an English sentence
    and then mentally translates it.
  • Machine translations goal is to find that
    English sentence.
  • Equation summarizes the 3 computational
    challenges presented by statistical translation.
  • Language Model Probability Estimation - Pr(e)
  • Translational Model Probability Estimation -
    Pr(fe)
  • Search Problem - maximizing their product
  • Why not reverse the translation models ?
  • Class Discussion !!

5
Alignments
  • What is a translation ?
  • Pair of strings that are translations of one
    another
  • (Qu aurions-nous pu faire ? What could we have
    done ?)
  • What is an alignment ?

6
Alignments
  • The mapping in an alignment could be from one-one
    to many-many.
  • The alignment in the figure is expressed as
  • (Le programme a ete mis en application And
    the(1) program(2) has(3) been(4)
    implemented(5,6,7)).
  • This alignment though acceptable has a lower
    probability.
  • (Le programme a ete mis en application
    And(1,2,3,4,5,6,7) the program has been
    implemented).
  • A(e,f) is the set of alignments of (fe)
  • If e has length l and f has length m, there
    are 2lm alignments in all.

7
Cepts
  • What is a cept ?
  • To express the fact that each word is related to
    a concept, in a figurative sense, a sentence is a
    web of concepts woven together
  • The cepts in the example are The, poor and dont
    have any money
  • There is the notion of an empty cept.

8
Translation Models
  • Five Translation models have been developed.
  • Each model is a recipe for computing Pr(fe),
    which is called the likelihood of the translation
    (f,e).
  • The likelihood is a function of many parameters
    (? !).
  • The idea is to guess values for these parameters
    and to apply the EM algorithm iteratively.

9
Translation Models
  • Models 1 and 2.
  • all possible lengths are equally possible
  • In Model 1, all connections for each french
    position are equally likely.
  • In Model 2, connection probabilities are more
    realistic
  • These models lead to unsatisfactory alignments
    very often
  • Models 3,4 and 5.
  • No assumptions on the length of the French string
  • Models 3 and 4 make more realistic assumptions on
    the connection probabilities
  • Models 1 - 4 are a stepping stone for the
    training of Model 5
  • Start with Model 1 for initial estimates and pipe
    thru the models, 2 - 5.

10
Translation Models
  • The likelihood of f e is,
  • over all elements of A(e,f)
  • Then,
  • choose the length of the French string given the
    English
  • for each french word position, choose the
    alignment, given previous alignments and words
  • choose the identity of the word at this position
    given our knowledge of the previous alignments
    and words.

11
Model 1
  • Assumptions
  • We assume Pr(me) is independent of e and m
  • All reasonable lengths of the French string are
    equally likely.
  • Also, depends only on l.
  • All connections are equally likely, and for a
    word there are
  • (l 1) connections, so this quantity is equal to
    (l 1) -1

  • is called the translation
    probability of fj given eaj

12
Model 1
  • The joint likelihood function for Model 1 is,
  • and for j 1 m, and aj from 1 l
  • Therefore,
  • subject to,

13
Model 1
  • Technique of Lagrange Multipliers,
  • EM algorithm is applied repeatedly.
  • ?
  • X
  • Y
  • The expected number of times e connects to f is

t (f e) f, e, l set of aj
14
Model 1
15
Model 1 -gt Model 2
  • Model 1 does not take into account where words
    appear in either string
  • All connections are equally probable
  • In Model 2, alignment probabilities are
    introduced and,
  • which satisfy the constraints,

16
Model 2
  • The likelihood function now is,
  • and the cost function is,

17
Fertitlity and Tablet
  • Fertility of a english word is the number of
    French words it is connected to - ?i
  • Each english word translates to a set of French
    words called the Tablet - Ti
  • The collection of Tablets is the Tableau - T.
  • The final French string is a permutation of the
    words in the Tableau - ?

18
Joint Likelihood of a Tableau and Permutation
  • The joint likelihood of a Tableau and Permutation
    is,
  • and ,

19
Model 3
  • Assumptions
  • The fertility probability of an english word only
    depends on the word.
  • The translation probability is,
  • The distortion probability is,

20
Model 3
  • The likelihood function for Model 3 is now,

21
Deficiency of Model 3
  • The fertility of word i does not depend on the
    fertility of previous words.
  • Does not always concentrate its probability on
    events of interest.
  • This deficiency is no serious problem.
  • It might decrease the probability of all
    well-formed strings by a constant factor.

22
Model 4
  • Allowing Phrases in the English String to move
    and be translated as units in the French String
  • Model 3 doesnt account for this well, because of
    the word by word movement.
  • where, A and B are functions of the French
    and English words.
  • Using this they account for facts that an
    adjective appears before a noun in English and
    reverse in Frernch. - THIS IS GOOD !

23
Model 4
  • For example, implemented produces mis en
    application, all occuring together, whereas not
    produces ne pas which occurs with a word in
    between.
  • So, dgt1(2 B(pas)) is relatively large when
    compared to dgt1(2 B(en))
  • Models 3 and 4 are both deficient. Words can be
    placed before the first position or beyond the
    last position in the French string. Model 5
    removes this deficiency.

24
Model 5
  • They define to be the number of vacancies up to
    and including position j just before forming the
    words of the ith cept.
  • And, this gives rise to the following distortion
    probability equation,
  • Model 5 is powerful but must be used in tandem
    with the other 4 models.

25
Results
26
Changing Viterbi Alignments with Iterations
27
Key Points from Results
  • Words like nodding have a large fertility because
    they dont slip gracefully into French.
  • Words like should do not have a fertility greater
    than one but they translate into many different
    possible words, their translation probability is
    spread more.
  • Words like the have zero fertlility some times
    since English prefers an article in some places
    where French does not.
Write a Comment
User Comments (0)
About PowerShow.com