AVENUE Automatic Machine Translation for lowdensity languages - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

AVENUE Automatic Machine Translation for lowdensity languages

Description:

Mapudungun words. Spanish words. all the words starting with a letter ... students and teachers, mostly Mapuche, but maybe some Spanish users as well ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 21
Provided by: csC76
Category:

less

Transcript and Presenter's Notes

Title: AVENUE Automatic Machine Translation for lowdensity languages


1
AVENUEAutomatic Machine Translation for
low-density languages
  • Ariadna Font Llitjós
  • Language Technologies Institute
  • SCS
  • Carnegie Mellon University

2
2 HCI project proposals
  • Interface to Online Bilingual and Multilingual
    Dictionaries
  • Translation Correction Tool interface design,
    implementation and user studies

3
Online Bilingual and Multilingual Dictionaries
  • bilingual and multilingual dictionaries for
    indigenous languages (Mapudungun Chile, Inupiaq
    Alaska, Aymara, Quechua and Aguaruna Peru)
  • For each bilingual/multilingual dictionary, we
    (will) have an excel database created by the
    local teams (Mapudungun from a spoken corpus
    transcribed and translated into Spanish)

4
(No Transcript)
5
Online Bilingual and Multilingual Dictionaries
(cont.)
  • For each entry, we give the translation in
    Spanish, some other linguistic information (POS),
    and a link to the actual sentence where it
    appears in the corpus. For example
  • Püñpüñkünuukey se manifiesta en forma de
    ronchas
  • nmlch-nmfhp1_x_0031_nmfhp_00
  •       Mapu Fey itrofillpüle kuerpu, ta pichike
    püñpüñkünuukey ta kalül may, peñi.  Sp Así
    es en todas partes del cuerpo, pequeñas ronchas
    se forman en el cuerpo pues, hermano

6
(No Transcript)
7
Online Bilingual and Multilingual Dictionaries
(cont.)
  • Currently, users can search for
  • Mapudungun words
  • Spanish words
  • all the words starting with a letter
  • all the words containing a word or a string of
    characters

8
Online Bilingual and Multilingual Dictionaries
(cont.)
  • Primary users
  • people in the indigenous communities
  • researchers in these countries, inside and
    outside the indigenous communities
  • Chilean case
  • product of the Ministry of Education.
  • students and teachers, mostly Mapuche, but maybe
    some Spanish users as well

9
Online Bilingual and Multilingual Dictionaries
(cont.)
  • Secondary users
  • Linguistic, Lexicography and Anthropology
    researchers from all over the world
  • random people browsing the www

10
Online Dictionaries Tasks for HCII project
  • analyze design of the basic web interface
  • given a query for a word in either language, it
    presents the information for that entry to the
    user in the other language.
  • how to incorporate an audio file with the word as
    it was pronounced in the spoken corpus.
  • how to make it interactive, i.e. have bilingual
    users comment on the entries and possibly add new
    entries (need profile info)

11
Translation Correction Tool (TCTool)
  • AVENUE is a project which developsAutomatic
    Machine Translation Systems for low-density
    languages
  • Since translations are automatic, i.e. not
    perfect, we need to refine them.
  • instead of having a professional translator, we
    want to find an automatic way to refine the
    output of the MTS -gt TCTool

12
TCTool
  • We can use the TCTool to automatically learn a
    refinement of the Transfer rules in our MTS, from
    users input
  • Challenges
  • users most likely not familiar with computers -gt
    user-friendly and Intuitive interface
  • bilingual informants cant be assumed to have any
    linguistic knowledge

13
Automatic Machine Translation
Interlingua
interpretation
Transfer rules
Corpus-based methods
generation
analysis
14
Automatic Learning of a Transfer-based MTS
tentative Transfer rules
SVS algorithm
Elicitation corpus
Transfer module
Rule Refinement module
(tentative) TL sentences
SL sentences
15
Interactive and Automatic rule refinement
  • Interactive step (TCTool)
  • Given an MTS, translate sentences and
  • present them to the users for minimal
  • correction (interface design, MT error
    classification)
  • Automatic step
  • Machine learning DS and algorithms to map user
    input with refined transfer-rules

16
User studies snapshot
17
(No Transcript)
18
(No Transcript)
19
TCTool Tasks for HCII project
  • analyze design of the basic web interface
  • given a translated sentence, it asks the user to
    minimally correct it, if incorrect, and to
    classify the error(s).
  • how to explain what minimally correction is
  • what is the right error classification for
    non-expert and non-linguist users
  • Can naïve users reliably pinpoint the source of
    errors?
  • design User Studies to show reliability of user
    input (Spanish English, English Spanish,
    English Chinese)

20
AVENUE project members
  • LTI team
  • Researchers  Ph. D. students
  • Jaime Carbonell Ariadna Font Llitjós
  • Lori Levin Christian Monson
  • Alon Lavie Erik Peterson
  • Ralf Brown Katharina Probst
  • Avenue External Project Coordinator 
  • Rodolfo M Vega,
  • Chilean team
  • Eliseo Cañulef Luis Caniupil Huaiquiñir
  • Hugo Carrasco Marcela Collio Calfunao
  • Rosendo Huisca Cristian Carrillan Anton
    Hector Painequeo Salvador Cañulef
  • Flor Caniupil Claudio Millacura

21
Questions?
  • For more information
  • http//www.cs.cmu.edu/aria/avenue/
Write a Comment
User Comments (0)
About PowerShow.com