LING 270 Language, Technology and Society Unit 9: Machine Translation

1 / 38
About This Presentation
Title:

LING 270 Language, Technology and Society Unit 9: Machine Translation

Description:

People thought machine translation would be easy. Instead Translation has become the 'Holy ... La Manche. English Channel. Linguistics 270. 40. The future of MT ... –

Number of Views:37
Avg rating:3.0/5.0
Slides: 39
Provided by: richar781
Category:

less

Transcript and Presenter's Notes

Title: LING 270 Language, Technology and Society Unit 9: Machine Translation


1
LING 270Language, Technology and SocietyUnit
9 Machine Translation
  • Richard Sproat
  • URL http//catarina.ai.uiuc.edu/L270/

2
This Lecture
  • Problems in translation
  • Early history of machine translation
  • Approaches to machine translation

3
Problems in translation
4
Problems in translation
5
Lexical mismatches (from Steve Levinson)
6
Lexical mismatches
7
Early views on MT
  • People thought machine translation would be easy
  • Instead Translation has become the Holy Grail
    of NLP

8
Early work on MT
9
Early work
10
1966 the ALPAC report
11
The aftermath of the ALPAC report
12
The aftermath of the ALPAC report
13
The aftermath of the ALPAC report
14
Basic approaches to MT
  • Components of a translation system
  • Source-language analysis
  • Transfer
  • Target-language generation

15
Basic approaches to MT
  • Transfer Approaches
  • Interlingua
  • Knowledge-Based MT

16
Different Approaches to MT (Knight 1997)
17
A toy system VEST
18
VEST architecture
19
Grammar compiler
20
Grammar compiler
21
Parse Tree used for Translation
22
Tree-to-tree transducer
For a more advanced speech-to-speech translation
system see http//verbmobil.dfki.de/overview-us.ht
ml.
23
Full-scale transfer approaches
  • All transfer systems are similar to VEST in that
    they involve some amount of syntactic analysis,
    along with lexical transfer.
  • The main advantage of transfer approaches is that
    they make few assumptions about deep semantic
    analysis.
  • The main disadvantage is that you need a
    different system for each language pair.
  • Most commercial systems, such as Systran, use
    some kind of transfer approach.

24
Interlingua approaches
  • Interlingua approaches assume one can do a fairly
    deep semantic analysis into a language
    independent interlingual representation
  • The target language is generated from this
    supposedly language-independent representation
  • Advantage if you want to translate among M
    languages, instead of M2 -M translation systems,
    you just need M analyzers and M generators.
  • Popular in Europe, e.g. in the Eurotra project

25
Knowledge-based MT CMUs Kant System
  • (Nyberg, E. and Mitamura, T. 1992. The KANT
    System Fast, Accurate, High-Quality Translation
    in Practical Domains. COLING 1992)
  • Controlled input language about 14,000 words in
    a limited domain
  • Domain Model contains about 500 concept frames
  • Translates from English to Japanese, French and
    German

26
Kant example
27
Kant German translation
28
Knowledge-based MT
29
Some observations
  • Human translators probably use a combination of
    these techniques, as needed
  • They will use transfer mechanisms, particularly
    in translating common locutions
  • They will do a deeper semantic analysis of a text
    in order to provide a more fluent translation
  • They will use real-world knowledge when needed
  • How are MT systems evaluated?
  • On at least the following two dimensions
  • Fidelity how accurately does the translation
    reflect the meaning of the original?
  • Fluency how fluent is the translation with
    respect to the target language?
  • But these are expensive

30
The BLEU Score
  • Proposed by the IBM statistical MT group
  • Averages the precision on 1-, 2-, 3- and possibly
    4-grams between the generated translation and a
    reference translation
  • An additional length penalty if the generated
    translation is too short.

31
Statistical approaches
  • Two phases
  • Alignment
  • Translation

32
Alignment models
  • Alignment proceeds through various levels of
    granularity
  • Sentence alignment
  • Word alignment
  • Even character alignment

33
An alignment model
34
(No Transcript)
35
(No Transcript)
36
Word alignment
37
How misaligned can things be?
38
Statistical MT
  • A simple version of the methods described in
    Brown et al. (1990), for French-to-English,
    consisting of three components
  • Language model for the target language (English).
  • Translation model
  • Decoder

39
Translation model
40
Decoder noisy channel model
La Manche
English Channel
41
The future of MT
  • 1999 Johns Hopkins Workshop produced public
    domain versions of the IBM tools
  • The workshop included MT in a Day, where an MT
    system for Chinese-English was demoed that took
    24 hours to produce.
  • See also http//www.isi.edu/natural-language/proje
    cts/rewrite/
  • More recently there have been DARPA-sponsored
    projects on MT.
  • Some of these involve limited speech-based
    phrase-to-phrase translation
  • 2005 JHU Workshop had a project on Statistical
    Machine Translation by Parsing (http//www.clsp.jh
    u.edu/ws2005/groups/statistical)

42
(Courtesy of Kevin Knight)
Write a Comment
User Comments (0)
About PowerShow.com