LING 270 Language, Technology and Society Unit 9: Machine Translation presentation

About This Presentation

Title:

LING 270 Language, Technology and Society Unit 9: Machine Translation

Description:

People thought machine translation would be easy. Instead Translation has become the 'Holy ... La Manche. English Channel. Linguistics 270. 40. The future of MT ... –

Number of Views:37

Avg rating:3.0/5.0

Slides: 39

Provided by: richar781

Category:

more less

Transcript and Presenter's Notes

Title: LING 270 Language, Technology and Society Unit 9: Machine Translation

1
LING 270Language, Technology and SocietyUnit
9 Machine Translation

Richard Sproat
URL http//catarina.ai.uiuc.edu/L270/

2
This Lecture

Problems in translation
Early history of machine translation
Approaches to machine translation

3
Problems in translation
4
Problems in translation
5
Lexical mismatches (from Steve Levinson)
6
Lexical mismatches
7
Early views on MT

People thought machine translation would be easy
Instead Translation has become the Holy Grail
of NLP

8
Early work on MT
9
Early work
10
1966 the ALPAC report
11
The aftermath of the ALPAC report
12
The aftermath of the ALPAC report
13
The aftermath of the ALPAC report
14
Basic approaches to MT

Components of a translation system
Source-language analysis
Transfer
Target-language generation

15
Basic approaches to MT

Transfer Approaches
Interlingua
Knowledge-Based MT

16
Different Approaches to MT (Knight 1997)
17
A toy system VEST
18
VEST architecture
19
Grammar compiler
20
Grammar compiler
21
Parse Tree used for Translation
22
Tree-to-tree transducer
For a more advanced speech-to-speech translation
system see http//verbmobil.dfki.de/overview-us.ht
ml.
23
Full-scale transfer approaches

All transfer systems are similar to VEST in that
they involve some amount of syntactic analysis,
along with lexical transfer.
The main advantage of transfer approaches is that
they make few assumptions about deep semantic
analysis.
The main disadvantage is that you need a
different system for each language pair.
Most commercial systems, such as Systran, use
some kind of transfer approach.

24
Interlingua approaches

Interlingua approaches assume one can do a fairly
deep semantic analysis into a language
independent interlingual representation
The target language is generated from this
supposedly language-independent representation
Advantage if you want to translate among M
languages, instead of M2 -M translation systems,
you just need M analyzers and M generators.
Popular in Europe, e.g. in the Eurotra project

25
Knowledge-based MT CMUs Kant System

(Nyberg, E. and Mitamura, T. 1992. The KANT
System Fast, Accurate, High-Quality Translation
in Practical Domains. COLING 1992)
Controlled input language about 14,000 words in
a limited domain
Domain Model contains about 500 concept frames
Translates from English to Japanese, French and
German

26
Kant example
27
Kant German translation
28
Knowledge-based MT
29
Some observations

Human translators probably use a combination of
these techniques, as needed
They will use transfer mechanisms, particularly
in translating common locutions
They will do a deeper semantic analysis of a text
in order to provide a more fluent translation
They will use real-world knowledge when needed
How are MT systems evaluated?
On at least the following two dimensions
Fidelity how accurately does the translation
reflect the meaning of the original?
Fluency how fluent is the translation with
respect to the target language?
But these are expensive

30
The BLEU Score

Proposed by the IBM statistical MT group
Averages the precision on 1-, 2-, 3- and possibly
4-grams between the generated translation and a
reference translation
An additional length penalty if the generated
translation is too short.

31
Statistical approaches

Two phases
Alignment
Translation

32
Alignment models

Alignment proceeds through various levels of
granularity
Sentence alignment
Word alignment
Even character alignment

33
An alignment model
34
(No Transcript)
35
(No Transcript)
36
Word alignment
37
How misaligned can things be?
38
Statistical MT

A simple version of the methods described in
Brown et al. (1990), for French-to-English,
consisting of three components
Language model for the target language (English).
Translation model
Decoder

39
Translation model
40
Decoder noisy channel model
La Manche
English Channel
41
The future of MT

1999 Johns Hopkins Workshop produced public
domain versions of the IBM tools
The workshop included MT in a Day, where an MT
system for Chinese-English was demoed that took
24 hours to produce.
See also http//www.isi.edu/natural-language/proje
cts/rewrite/
More recently there have been DARPA-sponsored
projects on MT.
Some of these involve limited speech-based
phrase-to-phrase translation
2005 JHU Workshop had a project on Statistical
Machine Translation by Parsing (http//www.clsp.jh
u.edu/ws2005/groups/statistical)

42
(Courtesy of Kevin Knight)

Write a Comment

User Comments (0)

About PowerShow.com