Title: Outline for MT
1Outline for MT
- Intro and a little history
- Language Similarities and Divergences
- Four main MT Approaches
- Transfer
- Interlingua
- Direct
- Statistical
- Evaluation
2Project Description
- YES
- Andy Cotton, Ross Cranford, Ralph Harris, Andrew
Martin, Matt Ratliff, Allen Rawls, Chris Tripp - KINDA
- Jason Forsythe, Matt Singletary
- NO
- Jerry Martin, Bret Mohler, Rose Rahiminejad, Dan
Reeves, Bill Shipman, Tom Starr,
3What is MT?
- Translating a text from one language to another
automatically.
4Machine Translation
- As she lay there alone, Dai-yus thoughts turned
to Bao-chai Then she listened to the insistent
rustle of the rain on the bamboos and plantains
outside her window. The coldness penetrated the
curtains of her bed. Almost without noticing it
she had begun to cry.
- Dai-yu alone on bed top think-of-with-gratitude
Bao-chai again listen to window outside bamboo
tip plantain leaf of on-top rain sound sigh drop
clear cold penetrate curtain not feeling again
fall down tears come
5Machine Translation
- The Story of the Stone
- The Dream of the Red Chamber (Cao Xueqin 1792)
- Issues
- Breaking up into words
- Breaking up into sentences
- Anaphora
- Penetrate - penetrated
- Bamboo tip plaintain leaf - bamboos and
plantains - Curtain - curtains of her bed
- Rain sound sigh drop - insistent rustle of the
rain
6What is MT not good for?
- Really hard stuff
- Literature
- Natural spoken speech (meetings, court reporting)
- Really important stuff
- Medical translation in hospitals, 911
7What is MT good for?
- Tasks for which a rough translation is fine
- Web pages, email
- Tasks for which MT can be post-edited
- MT as first pass
- Computer-aided human translation
- Tasks in sublanguage domains where high-quality
MT is possible
8Sublanguage domain
- Weather forecasting
- Cloudy with a chance of showers today and
Thursday - Low tonight 4
- Can be modeling completely enough to use raw MT
output - Word classes and semantic features like MONTH,
PLACE, DIRECTION, TIME POINT
9History of MT Pessimism
- 1959/1960 Bar-Hillel Report on the state of MT
in US and GB - Argued FAHQT too hard (semantic ambiguity, etc)
- Should work on semi-automatic instead of
automatic - His argumentLittle John was looking for his toy
box. Finally, he found it. The box was in the
pen. John was very happy. - Only human knowledge lets us know that
playpens are bigger than boxes, but writing
pens are smaller - His claim we would have to encode all of human
knowledge
10History of MT
- 1976 Meteo, weather forecasts from English to
French - Systran (Babelfish) been used for 40 years
- 1970s
- European focus in MT mainly ignored in US
- 1980s
- ideas of using AI techniques in MT (KBMT, CMU)
- 1990s
- Commercial MT systems
- Statistical MT
- Speech-to-speech translation
11Language Similarities and Divergences
- Some aspects of human language are universal or
near-universal, others diverge greatly. - Typology the study of systematic
cross-linguistic similarities and differences - What are the dimensions along with human
languages vary?
12Morphological Variation
- Isolating languages
- Cantonese, Vietnamese each word generally has
one morpheme - Vs. Polysynthetic languages
- Siberian Yupik (Eskimo) single word may have
very many morphemes - Agglutinative languages
- Turkish morphemes have clean boundaries
- Vs. Fusion languages
- Russian single affix may have many morphemes
13Syntactic Variation
- SVO (Subject-Verb-Object) languages
- English, German, French, Mandarin
- SOV Languages
- Japanese, Hindi
- VSO languages
- Irish, Classical Arabic
- SVO lgs generally prepositions to Yuriko
- VSO lgs generally postpositions Yuriko ni
14Segmentation Variation
- Not every writing system has word boundaries
marked - Chinese, Japanese, Thai, Vietnamese
- Some languages tend to have sentences that are
quite long, closer to English paragraphs than
sentences - Modern Standard Arabic, Chinese
15Inferential Load
- Some languages require the hearer to do more
figuring out of who the various actors in the
various events are - Japanese, Chinese,
- Other languages are pretty explicit about saying
who did what to whom. - English
16Inferential Load (2)
All noun phrases in blue do not appear in Chinese
text But they are needed for a good translation
17Lexical Divergences
- Word to phrases
- English computer science French
informatique - POS divergences
- Eng. she likes/VERB to sing
- Ger. Sie singt gerne/ADV
- Eng Im hungry/ADJ
- Sp. tengo hambre/NOUN
18Lexical Divergences Specificity
- Grammatical constraints
- English has gender on pronouns, Mandarin not.
- So translating 3rd person from Chinese to
English, need to figure out gender of the person! - Similarly from English they to French
ils/elles - Semantic constraints
- English brother
- Mandarin gege (older) versus didi (younger)
- English wall
- German Wand (inside) Mauer (outside)
- German Berg
- English hill or mountain
19Lexical Divergence one-to-many
20Lexical Divergence lexical gaps
- Japanese no word for privacy
- English no word for Cantonese haauseun or
Japanese oyakoko (something like filial
piety) - English cow versus beef, Cantonese ngau
21Event-to-argument divergences
- English
- The bottle floated out.
- Spanish
- La botella salió flotando.
- The bottle exited floating
- Verb-framed lg mark direction of motion on verb
- Spanish, French, Arabic, Hebrew, Japanese, Tamil,
Polynesian, Mayan, Bantu familiies - Satellite-framed lg mark direction of motion on
satellite - Crawl out, float off, jump down, walk over to,
run after - Rest of Indo-European, Hungarian, Finnish, Chinese
22MT on the web
- Babelfish
- http//babelfish.altavista.com/
233 methods for MT
- Direct
- Transfer
- Interlingua
24Three MT Approaches Direct, Transfer,
Interlingual
Interlingua
Bonnie Dorr. Original metaphor due to Bernard
Vauquois
Semantic Composition
Semantic Decomposition
Semantic Structure
Semantic Structure
Semantic Analysis
Semantic Generation
Semantic Transfer
Syntactic Structure
Syntactic Structure
Syntactic Transfer
Syntactic Analysis
Syntactic Generation
Word Structure
Word Structure
Direct
Morphological Generation
Morphological Analysis
Target Text
Source Text
25The Transfer Model
- Idea apply contrastive knowledge, i.e.,
knowledge about the difference between two
languages - Steps
- Analysis Syntactically parse Source language
- Transfer Rules to turn this parse into parse for
Target language - Generation Generate Target sentence from parse
tree
26Transfer architecture
27English to French
- Generally
- English Adjective Noun
- French Noun Adjective
- Note not always true
- Route mauvaise bad road, badly-paved road
- Mauvaise route wrong road)
- But is a reasonable first approximation
- Rule
28Lexical Transfer
- Man
- Ojisan old man
- Man is the only linguistic animal -
- Ningen man, human being
- Or
- Hito person, persons
- Can treat like lexical ambiguity,
- Disambiguate during parsing
29Transfer some problems
- N2 sets of transfer rules!
- Grammar and lexicon full of language-specific
stuff - Hard to build, hard to maintain
30MT Method 2 Interlingua
- Intuition Instead of lg-lg knowledge rules, use
the meaning of the sentence to help - Steps
- 1) translate source sentence into meaning
representation - 2) generate target sentence from meaning.
31Interlingua forthere was an old man gardening
- EVENT GARDENING
- AGENT MAN
- NUMBER SG
- DEFINITENESS INDEF
- ASPECT PROGRESSIVE
- TENSE PAST
32Interlingua
- Idea is that some of the MT work that we need to
do is part of other NLP tasks - E.g., disambiguating Ebook Slibro from Ebook
Sreservar - So we could have concepts like BOOKVOLUME and
RESERVE and solve this problem once for each
language
33Vauqois diagram
34Direct Translation
- Idea more robust, word-specific models
- Start with a Source language sentence
- Write little transformations, directly on words,
to turn it into a Target language sentence.
35Direct MT J-to-E
- Watashihatsukuenouenopenwojonniageta.
- 1. Morphological analysis
- Watashi h tsukue no ue no pen wo jon ni ageru
PAST - 2) lexical transfer of content words
- I ha desk no ue no pen wo John ni give PAST
- 3) various preposition work
- I ha pen on desk wo John to give PAST.
- 4) SVO rearrangements
- I give PAST pen on desk John to.
- 5) miscellany
- I give PAST the pen on the desk to John.
- 6) morphological generation
- I gave the pen on the desk to John.
36Direct MT stage 2, (ex. from Panov 1960 via
Hutchins 1986)
- Function direct-translate-much/many
- If preceding word is how
- Return skolko
- Else if preceding word is as
- Return skolko zhe
- Else if word is much
- If preceding words is very
- Return nil (not translated)
- Else if following word is a noun
- Return mnogo
- Else /word is many/
- If preceding word is PREP and following is NOUN
- Return mnogii
- Else return mnogo
37Three MT Approaches Direct, Transfer,
Interlingual
Interlingua
Slide Bonnie Dorr Original metaphor due
to Bernard Vauquois
Semantic Composition
Semantic Decomposition
Semantic Structure
Semantic Structure
Semantic Analysis
Semantic Generation
Semantic Transfer
Syntactic Structure
Syntactic Structure
Syntactic Transfer
Syntactic Analysis
Syntactic Generation
Word Structure
Word Structure
Direct
Morphological Generation
Morphological Analysis
Target Text
Source Text
383 methods pros and cons
39Direct MT pros and cons (Bonnie Dorr)
- Pros
- Fast
- Simple
- Cheap
- No translation rules hidden in lexicon
- Cons
- Unreliable
- Not powerful
- Rule proliferation
- Requires lots of context
- Major restructuring after lexical substitution
40Interlingual MT pros and cons (B. Dorr)
- Pros
- Avoids the N2 problem
- Easier to write rules
- Cons
- Semantics is HARD
- Useful information lost (paraphrase)
41Summary
- Intro and a little history
- Language Similarities and Divergences
- Four main MT Approaches
- Transfer
- Interlingua
- Direct
- Statistical
- Evaluation