MEAD Project Technical Overview - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

MEAD Project Technical Overview

Description:

Users connect to MEAD website. Log on (user ID password) Update profile (optional) ... MEAD Planning meeting. 9. Statistical MT (SMT) ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 19
Provided by: AlonL
Category:

less

Transcript and Presenter's Notes

Title: MEAD Project Technical Overview


1
MEAD Project Technical Overview
  • Alon Lavie
  • November 19, 2004

2
Expected Functionality
  • Users connect to MEAD website
  • Log on (user ID password)
  • Update profile (optional)
  • System matches user with another user that
    matches their desired chat partner profile
  • Opens chat text window on both sides
  • Users engage in chat dialogue, with MT
    translating messages
  • Either user can terminate the chat connection (or
    timeout disconnects them)

3
Envisioned General Architecture
  • Distributed Architecture
  • Central web server (or multiple servers)
  • Communication Mediator Server (CMS) controls the
    communication channel of each active chat
    connection
  • Machine Translation Server (MTS) provides MT
    translations for individual messages
  • CMS and MTS can be physically located anywhere on
    the internet, communicate using TCP/IP over the
    internet

4
General Architecture
5
Machine Translation
  • Features and Requirements
  • Bi-directional translation between English and
    Arabic
  • Real-time MT
  • Conversational dialogue effects
  • Dialect effects
  • Typos and spelling errors

6
Existing CMU MT Technology
  • Example-based MT (EBMT)
  • Statistical MT (SMT)
  • Transfer-based MT (XFER)
  • Multi-Engine combinations (MEMT)

7
Approaches to MT Vaquois MT Triangle
Interlingua
Give-informationpersonal-data (namealon_lavie)
Generation
Analysis
Transfer
s vp accusative_pronoun chiamare proper_name
s np possessive_pronoun name vp be
proper_name
Direct
Mi chiamo Alon Lavie
My name is Alon Lavie
8
EBMT Paradigm
New Sentence (Source) Yesterday, 200 delegates
met with President Clinton. Matches to Source
Found
Yesterday, 200 delegates met behind closed
doors Difficulties with President Clinton
Gestern trafen sich 200 Abgeordnete hinter
verschlossenen Schwierigkeiten mit Praesident
Clinton
Alignment (Sub-sentential)
Yesterday, 200 delegates met behind closed
doors Difficulties with President Clinton over
Gestern trafen sich 200 Abgeordnete hinter
verschlossenen Schwierigkeiten mit Praesident
Clinton
Translated Sentence (Target)
Gestern trafen sich 200 Abgeordnete mit
Praesident Clinton.
9
Statistical MT (SMT)
  • Proposed by IBM in early 1990s a direct, purely
    statistical, model for MT
  • Statistical translation models are trained on a
    sentence-aligned translation corpus
  • Attractive completely automatic, no manual
    rules, much reduced manual labor
  • Main drawbacks
  • Effective only with huge volumes (several
    mega-words) of parallel text
  • Very domain-sensitive
  • Still viable only for small number of language
    pairs!
  • Impressive progress in last 3-4 years due to
    large DARPA funding program (TIDES)

10
Transfer with Strong Decoding
11
(No Transcript)
12
Multi-Engine MT
  • Apply several MT engines to each input use
    statistical language modeller to select best
    combination of outputs.
  • Goal is to combine strengths, and avoid
    weaknesses.
  • Along all dimensions domain limits, quality,
    development time/cost, run-time speed, etc.
  • Used in various projects

13
Senior Personnel
  • Alon Lavie PI and Project Manager
  • Distributed Architecture, System Integration
  • Lori Levin co-PI
  • Linguistic Resources
  • Violetta Cavalli-Sforza co-PI
  • Data Collection, Informants, System Testing
  • Nizar Habash co-PI
  • Issues in Arabic Processing, Encodings, Dialects

14
Project Workplan
  • Project duration 2 years
  • Year-1 Prototype development
  • Develop and integrate all components
  • Full functionality, but limited coverage and
    quality
  • Demo system
  • Year-2 Working System development
  • Language coverage
  • Translation quality
  • User interfaces
  • Volume of simultaneous users
  • System testing
  • Goal is to deploy system at end of year-2!!

15
Questions?
16
(No Transcript)
17
EBMT
  • Developed originally for the PANGLOSS system in
    the early 1990s
  • Translation between English and Spanish
  • Generalized EBMT under development for the past
    several years
  • Currently one of the two MT approaches developed
    at CMU for the DARPA/TIDES program
  • Chinese-to-English, large and very large amounts
    of sentence-aligned parallel data
  • Active research work on improving alignment and
    indexing, decoding from a lattice
  • Contact Faculty Ralf Brown and Jaime Carbonell

18
Statistical MT
  • Word-to-word and phrase-to-phrase translation
    pairs are acquired automatically from data and
    assigned probabilities based on a statistical
    model
  • Extracted and trained from very large amounts of
    sentence-aligned parallel text
  • Word alignment algorithms
  • Phrase detection algorithms
  • Translation model probability estimation
  • Main approach pursued in CMU systems in the
    DARPA/TIDES program
  • Chinese-to-English and Arabic-to-English
  • Most active work is on phrase detection and on
    advanced lattice decoding
  • Contact Faculty Stephan Vogel and Alex Waibel
Write a Comment
User Comments (0)
About PowerShow.com