Title: Introduction to MT
1Introduction to MT
- Ling 580
- Fei Xia
- Week 1 1/03/06
2Outline
- Course overview
- Introduction to MT
- Major challenges
- Major approaches
- Evaluation of MT systems
- Overview of word-based SMT
3Course overview
4General info
- Course website
- Syllabus (incl. slides and papers) updated every
week. - Message board
- ESubmit
- Office hour Fri 1030am-1230pm.
- Prerequisites
- Ling570 and Ling571.
- Programming C or C, Perl is a plus.
- Introduction to probability and statistics
5Expectations
- Reading
- Papers are online
- Finish reading before class. Bring your questions
to class. - Grade
- Leading discussion (1-2 papers) 50
- Project 40
- Class participation 10
- No quizzes, exams
6Leading discussion
- Indicate your choice via EPost by Jan 8.
- You might want to read related papers.
- Make slides with PowerPoint.
- Email me your slides by 330am on the Monday
before your presentation. - Present the paper in class and lead the
discussion 40-50 minutes. -
7Project
- Details will be available soon.
- Project presentation 3/7/06
- Final report due on 3/12/06
- Pongo account will be ready soon.
8Introduction to MT
9 A brief history of MT (Based on work by John
Hutchins)
- Before the computer In the mid 1930s, a
French-Armenian Georges Artsrouni and a Russian
Petr Troyanskii applied for patents for
translating machines. - The pioneers (1947-1954) the first public MT
demo was given in 1954 (by IBM and Georgetown
University). - The decade of optimism (1954-1966) ALPAC
(Automatic Language Processing Advisory
Committee) report in 1966 "there is no immediate
or predictable prospect of useful machine
translation."
10A brief history of MT (cont)
- The aftermath of the ALPAC report (1966-1980) a
virtual end to MT research - The 1980s Interlingua, example-based MT
- The 1990s Statistical MT
- The 2000s Hybrid MT
11Where are we now?
- Huge potential/need due to the internet,
globalization and international politics. - Quick development time due to SMT, the
availability of parallel data and computers. - Translation is reasonable for language pairs with
a large amount of resource. - Start to include more minor languages.
12What is MT good for?
- Rough translation web data
- Computer-aided human translation
- Translation for limited domain
- Cross-lingual IR
- Machine is better than human in
- Speed much faster than humans
- Memory can easily memorize millions of
word/phrase translations. - Manpower machines are much cheaper than humans
- Fast learner it takes minutes or hours to build
a new system. Erasable memory ? - Never complain, never get tired,
13Major challenges in MT
14Translation is hard
- Novels
- Word play, jokes, puns, hidden messages
- Concept gaps go Greek, bei fen
- Other constraints lyrics, dubbing, poem,
15Major challenges
- Getting the right words
- Choosing the correct root form
- Getting the correct inflected form
- Inserting spontaneous words
- Putting the words in the correct order
- Word order SVO vs. SOV,
- Unique constructions
- Divergence
16Lexical choice
- Homonymy/Polysemy bank, run
- Concept gap no corresponding concepts in another
language go Greek, go Dutch, fen sui, lame duck,
- Coding (Concept ? lexeme mapping) differences
- More distinction in one language e.g., kinship
vocabulary. - Different division of conceptual space
17Choosing the appropriate inflection
- Inflection gender, number, case, tense,
- Ex
- Number Ch-Eng all the concrete nouns
- ch_book ? book, books
- Gender Eng-Fr all the adjectives
- Case Eng-Korean all the arguments
- Tense Ch-Eng all the verbs
- ch_buy ? buy, bought, will buy
18Inserting spontaneous words
- Function words
- Determiners Ch-Eng
- ch_book ? a book, the book, the books,
books - Prepositions Ch-Eng
- ch_November ? in November
- Relative pronouns Ch-Eng
- ch_buy ch_book de ch_person ? the person
who bought /book/ - Possessive pronouns Ch-Eng
- ch_he ch_raise ch_hand ? He raised his
hand(s) - Conjunction Eng-Ch
- Although S1, S2 ? ch_although S1, ch_but S2
-
-
19Inserting spontaneous words (cont)
- Content words
- Dropped argument Ch-Eng
- ch_buy le ma ? Has Subj bought Obj?
- Chinese First name Eng-Ch
- Jiang ? ch_Jiang ch_Zemin
- Abbreviation, Acronyms Ch-Eng
- ch_12 ch_big ? the 12th National Congress of
the CPC (Communist Party of China) -
20Major challenges
- Getting the right words
- Choosing the correct root form
- Getting the correct inflected form
- Inserting spontaneous words
- Putting the words in the correct order
- Word order SVO vs. SOV,
- Unique construction
- Structural divergence
21Word order
- SVO, SOV, VSO,
- VP PP ? PP VP
- VP AdvP ? AdvP VP
- Adj N ? N Adj
- NP PP ? PP NP
- NP S ? S NP
- P NP ? NP P
22Unique Constructions
- Overt wh-movement Eng-Ch
- Eng Why do you think that he came yesterday?
- Ch you why think he yesterday come ASP?
- Ch you think he yesterday why come?
- Ba-construction Ch-Eng
- She ba homework finish ASP ? She finished her
homework. - He ba wall dig ASP CL hole ? He digged a hole in
the wall. - She ba orange peel ASP skin ? She peeled the
oranges skin.
23Translation divergences
- Source and target parse trees (dependency trees)
are not identical. - Example I like Mary ? S Marta me gusta a mi
(Mary pleases me) - More discussion next time.
24Major approaches
25How humans do translation?
- Learn a foreign language
- Memorize word translations
- Learn some patterns
- Exercise
- Passive activity read, listen
- Active activity write, speak
- Translation
- Understand the sentence
- Clarify or ask for help (optional)
- Translate the sentence
Training stage
Translation lexicon
Templates, transfer rules
Reinforced learning? Reranking?
Decoding stage
Parsing, semantics analysis?
Interactive MT?
Word-level? Phrase-level? Generate from meaning?
26What kinds of resources are available to MT?
- Translation lexicon
- Bilingual dictionary
- Templates, transfer rules
- Grammar books
- Parallel data, comparable data
- Thesaurus, WordNet, FrameNet,
- NLP tools tokenizer, morph analyzer, parser,
- ? More resources for major languages, less for
minor languages.
27Major approaches
- Transfer-based
- Interlingua
- Example-based (EBMT)
- Statistical MT (SMT)
- Hybrid approach
28The MT triangle
Meaning
(interlingua)
Synthesis
Analysis
Transfer-based
Phrase-based SMT, EBMT
Word-based SMT, EBMT
word
Word
29Transfer-based MT
- Analysis, transfer, generation
- Parse the source sentence
- Transform the parse tree with transfer rules
- Translate source words
- Get the target sentence from the tree
- Resources required
- Source parser
- A translation lexicon
- A set of transfer rules
- An example Mary bought a book yesterday.
30Transfer-based MT (cont)
- Parsing linguistically motivated grammar or
formal grammar? - Transfer
- context-free rules? A path on a dependency tree?
- Apply at most one rule at each level?
- How are rules created?
- Translating words word-to-word translation?
- Generation using LM or other additional
knowledge? - How to create the needed resources automatically?
31Interlingua
- For n languages, we need n(n-1) MT systems.
- Interlingua uses a language-independent
representation. - Conceptually, Interlingua is elegant we only
need n analyzers, and n generators. - Resource needed
- A language-independent representation
- Sophisticated analyzers
- Sophisticated generators
32Interlingua (cont)
- Questions
- Does language-independent meaning representation
really exist? If so, what does it look like? - It requires deep analysis how to get such an
analyzer e.g., semantic analysis - It requires non-trivial generation How is that
done? - It forces disambiguation at various levels
lexical, syntactic, semantic, discourse levels. - It cannot take advantage of similarities between
a particular language pair. -
33Example-based MT
- Basic idea translate a sentence by using the
closest match in parallel data. - First proposed by Nagao (1981).
- Ex
- Training data
- w1 w2 w3 w4 ? w1 w2 w3 w4
- w5 w6 w7 ? w5 w6 w7
- w8 w9 ? w8 w9
- Test sent
- w1 w2 w6 w7 w9 ? w1 w2 w6 w7 w9
34EMBT (cont)
- Types of EBMT
- Lexical (shallow)
- Morphological / POS analysis
- Parse-tree based (deep)
- Types of data required by EBMT systems
- Parallel text
- Bilingual dictionary
- Thesaurus for computing semantic similarity
- Syntactic parser, dependency parser, etc.
35EBMT (cont)
- Word alignment using dictionary and heuristics
- ? exact match
- Generalization
- Clusters dates, numbers, colors, shapes, etc.
- Clusters can be built by hand or learned
automatically. - Ex
- Exact match 12 players met in Paris last Tuesday
? - 12 Spieler trafen sich
letzen Dienstag in Paris - Templates num players met in city time ?
- num Spieler trafen sich
time in city
36Statistical MT
- Basic idea learn all the parameters from
parallel data. - Major types
- Word-based
- Phrase-based
- Strengths
- Easy to build, and it requires no human knowledge
- Good performance when a large amount of training
data is available. - Weaknesses
- How to express linguistic generalization?
37Comparison of resource requirement
38Hybrid MT
- Basic idea combine strengths of different
approaches - Syntax-based generalization at syntactic level
- Interlingua conceptually elegant
- EBMT memorizing translation of n-grams
generalization at various level. - SMT fully automatic using LM optimizing some
objective functions. - Types of hybrid HT
- Borrowing concepts/methods
- SMT from EBMT phrase-based SMT Alignment
templates - EBMT from SMT automatically learned translation
lexicon - Transfer-based from SMT automatically learned
translation lexicon, transfer rules using LM -
- Using two MTs in a pipeline
- Using transfer-based MT as a preprocessor of SMT
- Using multiple MTs in parallel, then adding a
re-ranker.
39Evaluation of MT
40Evaluation
- Unlike many NLP tasks (e.g., tagging, chunking,
parsing, IE, pronoun resolution), there is no
single gold standard for MT. - Human evaluation accuracy, fluency,
- Problem expensive, slow, subjective,
non-reusable. - Automatic measures
- Edit distance
- Word error rate (WER), Position-independent WER
(PER) - Simple string accuracy (SSA), Generation string
accuracy (GSA) - BLEU
41Edit distance
- The Edit distance (a.k.a. Levenshtein distance)
is defined as the minimal cost of transforming
str1 into str2, using three operations
(substitution, insertion, deletion). - Use DP and the complexity is O(mn).
42WER, PER, and SSA
- WER (word error rate) is edit distance, divided
by Ref. - PER (position-independent WER) same as WER but
disregards word ordering - SSA (Simple string accuracy) 1 - WER
- Previous example
- Sys w1 w2 w3 w4
- Ref w1 w3 w2
- Edit distance 2
- WER2/3
- PER1/3
- SSA1/3
43Generation string accuracy (GSA)
- Example
- Ref w1 w2 w3 w4
- Sys w2 w3 w4 w1
- Del1, Ins1 ? SSA1/2
- Move1, Del0, Ins0 ? GSA3/4
44BLEU
- Proposal by Papineni et. al. (2002)
- Most widely used in MT community.
- BLEU is a weighted average of n-gram precision
(pn) between system output and all references,
multiplied by a brevity penalty (BP).
45N-gram precision
- N-gram precision the percent of n-grams in the
system output that are correct. - Clipping
- Sys the the the the the the
- Ref the cat sat on the mat
- Unigram precision
- Max_Ref_count the max number of times a ngram
occurs in any single reference translation.
46N-gram precision
-
- i.e. the percent of n-grams in the system output
that are correct (after clipping).
47Brevity Penalty
- For each sent si in system output, find closest
matching reference ri (in terms of length). - Longer system output is already penalized by the
n-gram precision measure.
48An example
- Sys The cat was on the mat
- Ref1 The cat sat on a mat
- Ref2 There was a cat on the mat
- Assuming N3
- p15/6, p23/5, p31/4, BP1 ? BLEU0.50
- What if N4?
49Summary
- Course overview
- Major challenges in MT
- Choose the right words (root form, inflection,
spontaneous words) - Put them in right positions (word order, unique
constructions, divergences)
50Summary (cont)
- Major approaches
- Transfer-based MT
- Interlingua
- Example-based MT
- Statistical MT
- Hybrid MT
- Evaluation of MT systems
- Edit distance
- WER, PER, SSA, GSA
- BLEU
51Additional slides
52Translation divergences(based on Bonnie Dorrs
work)
- Thematic divergence I like Mary ?
- S Marta me gusta a mi (Mary pleases me)
- Promotional divergence John usually goes home ?
- S Juan suele ira casa (John tends to go
home) - Demotional divergence I like eating ?G Ich esse
gern (I eat likingly) - Structural divergence John entered the house ?
- S Juan entro en la casa (John entered in
the house)
53Translation divergences (cont)
- Conflational divergence I stabbed John ?
- S Yo le di punaladas a Juan (I gave
knife-wounds to John) - Categorial divergence I am hungry ?
- G Ich habe Hunger (I have hunger)
- Lexical divergence John broke into the room ?
- S Juan forzo la entrada al cuarto (John
forced the entry to the room)
54Calculating edit distance
- D(0, 0) 0
- D(i, 0) delCost i
- D(0, j) insCost j
- D(i1, j1)
- min( D(i,j) sub,
- D(i1, j) insCost,
- D(i, j1) delCost)
- sub 0 if str1i1str2j1
- subCost otherwise
55An example
- Sys w1 w2 w3 w4
- Ref w1 w3 w2
- All three costs are 1.
- Edit distance2