Course Summary - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Course Summary

Description:

dictionary. SMT. EBMT. Interlingua. Transfer-based. Evaluation ... Identifying and translating name entities and abbreviations. To build an MT system (1) ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 26
Provided by: coursesWa5
Category:

less

Transcript and Presenter's Notes

Title: Course Summary


1
Course Summary
  • LING 575
  • Fei Xia
  • 03/06/07

2
Outline
  • Introduction to MT 1
  • Major approaches
  • SMT 3
  • Transfer-based MT 2
  • Hybrid systems 2
  • Other topics

3
Introduction to MT
4
Major challenges
  • Translation is hard.
  • Getting the right words
  • Choosing the correct root form
  • Getting the correct inflected form
  • Inserting spontaneous words
  • Putting the words in the correct order
  • Word order SVO vs. SOV,
  • Unique constructions
  • Divergence

5
Lexical choice
  • Homonymy/Polysemy bank, run
  • Concept gap no corresponding concepts in another
    language go Greek, go Dutch, fen sui, lame duck,
  • Coding (Concept ? lexeme mapping) differences
  • More distinction in one language e.g., kinship
    vocabulary.
  • Different division of conceptual space

6
Major approaches
  • Transfer-based
  • Interlingua
  • Example-based (EBMT)
  • Statistical MT (SMT)
  • Hybrid approach

7
The MT triangle
Meaning
(interlingua)

Synthesis
Analysis
Transfer-based
Phrase-based SMT, EBMT
Word-based SMT, EBMT
word
Word
8
Comparison of resource requirement
9
Evaluation
  • Unlike many NLP tasks (e.g., tagging, chunking,
    parsing, IE, pronoun resolution), there is no
    single gold standard for MT.
  • Human evaluation accuracy, fluency,
  • Problem expensive, slow, subjective,
    non-reusable.
  • Automatic measures
  • Edit distance
  • Word error rate (WER), Position-independent WER
    (PER)
  • Simple string accuracy (SSA), Generation string
    accuracy (GSA)
  • BLEU

10
Major approaches
11
Word-based SMT
  • IBM Models 1-5
  • Main concepts
  • Source channel model
  • Hidden word alignment
  • EM training

12
Source channel model for MT
P(E)
P(F E)
Fr sent
Eng sent
Noisy channel
  • Two types of parameters
  • Language model P(E)
  • Translation model P(F E)

13
Modeling p(F E) with alignment
14
Modeling
Model 1
Model 2
  • Parameters
  • Length prob P(m
    l)
  • Translation prob t(fj
    ei)
  • Distortion prob (for Model 2) d(i j, m, l)

15
Training
  • Model 1

16
Finding the best alignment
Given E and F, we are looking for
Model 1
17
Clump-based SMT
  • The unit of translation is a clump.
  • Training stage
  • Word alignment
  • Extracting clump pairs
  • Decoding stage
  • Try all segmentations of the src sent and all the
    allowed permutations
  • For each src clump, try TopN tgt clumps
  • Prune the hypotheses

18
Transfer-based MT
  • Analysis, transfer, generation
  • Example (Quirk et al., 2005)
  • Parse the source sentence
  • Transform the parse tree with transfer rules
  • Translate source words
  • Get the target sentence from the tree
  • Translation as parsing
  • Example (Wu, 1995)

19
Hybrid approaches
  • Preprocessing with transfer rules (Xia and
    McCord, 2004), (Collins et al, 2005)
  • Postprocessing with taggers, parsers, etc JHU
    2003 workshop
  • Hierarchical phrase-based model (Chiang, 2005)

20
Other topics
21
Other issues
  • Resources
  • MT for Low density languages
  • Using comparable corpora and wikipedia
  • Special translation modules
  • Identifying and translating name entities and
    abbreviations

22
To build an MT system (1)
  • Gather resources
  • Parallel corpora, comparable corpora
  • Grammars, dictionaries,
  • Process data
  • Document alignment, sentence alignment
  • Tokenization, parsing,

23
To build an MT system (2)
  • Modeling
  • Training
  • Word alignment and extracting clump pairs
  • Learning transfer rules
  • Decoding
  • Identifying entities and translating them with
    special modules (optional)
  • Translation as parsing, or parse transfer
    translation
  • Segmenting src sentence, replace src clump with
    target clump,

24
To build an MT system (3)
  • Post-processing
  • System combination
  • Reranking
  • Using the system for other applications
  • Cross-lingual IR
  • Computer-assisted translation
  • .

25
Misc
  • Grades
  • Assignments ( hw1-hw3) 30
  • Class participation 20
  • Project
  • Presentation 25
  • Final paper 25
Write a Comment
User Comments (0)
About PowerShow.com