Title: MEAD Project Technical Overview
1MEAD Project Technical Overview
- Alon Lavie
- November 19, 2004
2Expected Functionality
- Users connect to MEAD website
- Log on (user ID password)
- Update profile (optional)
- System matches user with another user that
matches their desired chat partner profile - Opens chat text window on both sides
- Users engage in chat dialogue, with MT
translating messages - Either user can terminate the chat connection (or
timeout disconnects them)
3Envisioned General Architecture
- Distributed Architecture
- Central web server (or multiple servers)
- Communication Mediator Server (CMS) controls the
communication channel of each active chat
connection - Machine Translation Server (MTS) provides MT
translations for individual messages - CMS and MTS can be physically located anywhere on
the internet, communicate using TCP/IP over the
4General Architecture
5Machine Translation
- Features and Requirements
- Bi-directional translation between English and
Arabic - Real-time MT
- Conversational dialogue effects
- Dialect effects
- Typos and spelling errors
6Existing CMU MT Technology
- Example-based MT (EBMT)
- Statistical MT (SMT)
- Transfer-based MT (XFER)
- Multi-Engine combinations (MEMT)
7Approaches to MT Vaquois MT Triangle
Give-informationpersonal-data (namealon_lavie)
s vp accusative_pronoun chiamare proper_name
s np possessive_pronoun name vp be
Mi chiamo Alon Lavie
My name is Alon Lavie
8EBMT Paradigm
New Sentence (Source) Yesterday, 200 delegates
met with President Clinton. Matches to Source
Yesterday, 200 delegates met behind closed
doors Difficulties with President Clinton
Gestern trafen sich 200 Abgeordnete hinter
verschlossenen Schwierigkeiten mit Praesident
Alignment (Sub-sentential)
Yesterday, 200 delegates met behind closed
doors Difficulties with President Clinton over
Gestern trafen sich 200 Abgeordnete hinter
verschlossenen Schwierigkeiten mit Praesident
Translated Sentence (Target)
Gestern trafen sich 200 Abgeordnete mit
Praesident Clinton.
9Statistical MT (SMT)
- Proposed by IBM in early 1990s a direct, purely
statistical, model for MT - Statistical translation models are trained on a
sentence-aligned translation corpus - Attractive completely automatic, no manual
rules, much reduced manual labor - Main drawbacks
- Effective only with huge volumes (several
mega-words) of parallel text - Very domain-sensitive
- Still viable only for small number of language
pairs! - Impressive progress in last 3-4 years due to
large DARPA funding program (TIDES)
10Transfer with Strong Decoding
11(No Transcript)
12Multi-Engine MT
- Apply several MT engines to each input use
statistical language modeller to select best
combination of outputs. - Goal is to combine strengths, and avoid
weaknesses. - Along all dimensions domain limits, quality,
development time/cost, run-time speed, etc. - Used in various projects
13Senior Personnel
- Alon Lavie PI and Project Manager
- Distributed Architecture, System Integration
- Lori Levin co-PI
- Linguistic Resources
- Violetta Cavalli-Sforza co-PI
- Data Collection, Informants, System Testing
- Nizar Habash co-PI
- Issues in Arabic Processing, Encodings, Dialects
14Project Workplan
- Project duration 2 years
- Year-1 Prototype development
- Develop and integrate all components
- Full functionality, but limited coverage and
quality - Demo system
- Year-2 Working System development
- Language coverage
- Translation quality
- User interfaces
- Volume of simultaneous users
- System testing
- Goal is to deploy system at end of year-2!!
16(No Transcript)
- Developed originally for the PANGLOSS system in
the early 1990s - Translation between English and Spanish
- Generalized EBMT under development for the past
several years - Currently one of the two MT approaches developed
at CMU for the DARPA/TIDES program - Chinese-to-English, large and very large amounts
of sentence-aligned parallel data - Active research work on improving alignment and
indexing, decoding from a lattice - Contact Faculty Ralf Brown and Jaime Carbonell
18Statistical MT
- Word-to-word and phrase-to-phrase translation
pairs are acquired automatically from data and
assigned probabilities based on a statistical
model - Extracted and trained from very large amounts of
sentence-aligned parallel text - Word alignment algorithms
- Phrase detection algorithms
- Translation model probability estimation
- Main approach pursued in CMU systems in the
DARPA/TIDES program - Chinese-to-English and Arabic-to-English
- Most active work is on phrase detection and on
advanced lattice decoding - Contact Faculty Stephan Vogel and Alex Waibel