Title: AVENUE Automatic Machine Translation for lowdensity languages
1AVENUEAutomatic Machine Translation for
low-density languages
- Ariadna Font Llitjós
- Language Technologies Institute
- SCS
- Carnegie Mellon University
22 HCI project proposals
-
- Interface to Online Bilingual and Multilingual
Dictionaries - Translation Correction Tool interface design,
implementation and user studies
3Online Bilingual and Multilingual Dictionaries
- bilingual and multilingual dictionaries for
indigenous languages (Mapudungun Chile, Inupiaq
Alaska, Aymara, Quechua and Aguaruna Peru) - For each bilingual/multilingual dictionary, we
(will) have an excel database created by the
local teams (Mapudungun from a spoken corpus
transcribed and translated into Spanish)
4(No Transcript)
5Online Bilingual and Multilingual Dictionaries
(cont.)
- For each entry, we give the translation in
Spanish, some other linguistic information (POS),
and a link to the actual sentence where it
appears in the corpus. For example - Püñpüñkünuukey se manifiesta en forma de
ronchas - nmlch-nmfhp1_x_0031_nmfhp_00
- Mapu Fey itrofillpüle kuerpu, ta pichike
püñpüñkünuukey ta kalül may, peñi. Sp Así
es en todas partes del cuerpo, pequeñas ronchas
se forman en el cuerpo pues, hermano
6(No Transcript)
7Online Bilingual and Multilingual Dictionaries
(cont.)
- Currently, users can search for
- Mapudungun words
- Spanish words
- all the words starting with a letter
- all the words containing a word or a string of
characters
8Online Bilingual and Multilingual Dictionaries
(cont.)
- Primary users
- people in the indigenous communities
- researchers in these countries, inside and
outside the indigenous communities - Chilean case
- product of the Ministry of Education.
- students and teachers, mostly Mapuche, but maybe
some Spanish users as well
9Online Bilingual and Multilingual Dictionaries
(cont.)
- Secondary users
- Linguistic, Lexicography and Anthropology
researchers from all over the world - random people browsing the www
10Online Dictionaries Tasks for HCII project
- analyze design of the basic web interface
- given a query for a word in either language, it
presents the information for that entry to the
user in the other language. - how to incorporate an audio file with the word as
it was pronounced in the spoken corpus. - how to make it interactive, i.e. have bilingual
users comment on the entries and possibly add new
entries (need profile info)
11Translation Correction Tool (TCTool)
- AVENUE is a project which developsAutomatic
Machine Translation Systems for low-density
languages - Since translations are automatic, i.e. not
perfect, we need to refine them. - instead of having a professional translator, we
want to find an automatic way to refine the
output of the MTS -gt TCTool
12TCTool
- We can use the TCTool to automatically learn a
refinement of the Transfer rules in our MTS, from
users input - Challenges
- users most likely not familiar with computers -gt
user-friendly and Intuitive interface - bilingual informants cant be assumed to have any
linguistic knowledge
13Automatic Machine Translation
Interlingua
interpretation
Transfer rules
Corpus-based methods
generation
analysis
14Automatic Learning of a Transfer-based MTS
tentative Transfer rules
SVS algorithm
Elicitation corpus
Transfer module
Rule Refinement module
(tentative) TL sentences
SL sentences
15Interactive and Automatic rule refinement
- Interactive step (TCTool)
- Given an MTS, translate sentences and
- present them to the users for minimal
- correction (interface design, MT error
classification) - Automatic step
- Machine learning DS and algorithms to map user
input with refined transfer-rules
16User studies snapshot
17(No Transcript)
18(No Transcript)
19TCTool Tasks for HCII project
- analyze design of the basic web interface
- given a translated sentence, it asks the user to
minimally correct it, if incorrect, and to
classify the error(s). - how to explain what minimally correction is
- what is the right error classification for
non-expert and non-linguist users - Can naïve users reliably pinpoint the source of
errors? - design User Studies to show reliability of user
input (Spanish English, English Spanish,
English Chinese)
20AVENUE project members
- LTI team
- Researchers Ph. D. students
- Jaime Carbonell Ariadna Font Llitjós
- Lori Levin Christian Monson
- Alon Lavie Erik Peterson
- Ralf Brown Katharina Probst
- Avenue External Project Coordinator
- Rodolfo M Vega,
- Chilean team
- Eliseo Cañulef Luis Caniupil Huaiquiñir
- Hugo Carrasco Marcela Collio Calfunao
- Rosendo Huisca Cristian Carrillan Anton
Hector Painequeo Salvador Cañulef - Flor Caniupil Claudio Millacura
21Questions?
- For more information
- http//www.cs.cmu.edu/aria/avenue/