Title: Machine Translation with Scarce Resources
1Machine Translation with Scarce Resources
2Scarce Resources
- Not much text in electronic form.
- Very few linguists who can write computational
rules. - No standard orthography
- Kudaw, kusaw (work) (Mapudungun, Chile)
- Not even sure of pronunciation
- EH-nvelope, AH-nvelope (envelope) (English, US,
not a language with scarce resources)
3Our Approach
- Learn rules from a controlled corpus.
- Corpus is elicited from bilingual speakers.
- The informant only needs to translate and align
words.
4AVENUE Project
- New Ideas
- Use machine learning to learn translation rules
from native speakers who are not trained in
linguistics or computer science. - Multi-Engine translation architecture can
flexibly take advantage of whatever resources are
available. - Research partnerships with indigenous communities
in Latin America and Alaska (Mapudungun (Chile),
Siona (Colombia), Inupiaq (Alaska))
Interface for data elicitation
- Impact
- Rapid and low-cost development of machine
translation for languages with scarce resources. - Policy makers can get input from indigenous
people. - Indigenous people can participate in government
and internet.
Schedule
Year 1 Seeded Version Space learning first
version Year 2 Example-Based Machine
Translation of Mapudungun (Chile). Year 3
Multi-Engine Mapudungun system (EBMT and
partially learned transfer rules)
Carnegie Mellon University, Language Technologies
Institute L. Levin, J. Carbonell, A. Lavie, R.
Brown
5Elicitation Interface
6Elicitation Corpus example
- English I fell.
- Spanish Caí
- Mapudungun Tranün
-
- English I am falling.
- Spanish Estoy cayendo
- Mapudungun Tranmeken
7Elicitation Corpus example
- English You (John) fell.
- Spanish Tu (Juan) caiste
- Mapudungun Eymi tranimi (Kuan)
- English You (Mary) fell.
- Spanish Tu (María) caiste
- Mapudungun Eymi tranimi (Maria)
- English The rock fell.
- Spanish La piedra cayó
- Mapudungun Trani chi kura
8(No Transcript)