Title: Tanja Schultz
1Multilinguality in Automatic Speech Recognition
Systems
- Tanja Schultz
- Carnegie Mellon University, LTI
- ESSLLI workshop, Trento, August 14, 2002
2Outline
Tanja Schultz
- Multilinguality in Automatic Speech Recognition
(ASR) - Motivation, Topics
- Monolingual ASR in many languages
- Challenges, Language peculiarities
- Language Independent ASR
- Language Portability
- Rapid deployment of LVCSR systems
- What is done, what needs to be solved
- Applications of multilingual systems
3Motivation Multilinguality in ASR
- Myth Everyone speaks English, why bother?
- NO About 4500 different languages in the world,
increasing num-ber of languages in the Internet
(as indicator for growing diversity) - Computerization and Globalization feed the needs
for (civilian and military) applications in many
languages - Language diversity and digital discrimination
rapid deployment of systems in many language - Myth Its just retraining on foreign data - no
science - NO Other languages bring unseen challenges,
i.e. scripts, vocabu-lary and morphology,
tonality, less than ideal resources, few if any
speech/text data - ? Automatic Speech recognition in many languages
Human-to-Human communication, Human-Computer
interfaces
4Topics in Multilinguality
- Monolingual ASR systems in many languages
- Language Portability
- Combine Data and Knowledge of many languages
- Faster Cycles years -gt days
- Fewer Data Low density languages
- ASR Systems Acoustic Model, Dictionary, LM
- Multilingual applications
- Multilingual assistance / user interfaces
- Foreign Accents - non-native speech
- Language, Accent, and Speaker Identification
5Projects adressing Multilinguality
6Available Multilingual Resources
- What do we need for ASR audio, text, dictionary
- Data distribution, webpage and catalogues online
- http//www.icp.inpg.fr/ELRA/home.html
- http//www.ldc.upenn.edu/
- LDC BroadCast News (En, Ch, Sp, Cz, Ja, Ar),
CallFriend, CallHome (13 languages, only few
transcripts) - ELRA SpeechDat, SpeechDat-car, Aurora,
Verbmobil, Polyphone, Accor, SpeeCon - ? 20 language extensively studied so far
- Interactive Systems Labs GlobalPhone
7GlobalPhone
- Multilingual Database
- Widespread languages
- Native speakers
- Uniformity
- Broad domain
- Huge text resources
- Internet newspapers
- Total sum of resources
- 15 languages so far
- ? 300 hours speech data
- ? 1400 native speakers
Arabic Ch-Mandarin Ch-Shanghai Czech Croatian
French German Japanese Korean Portuguese
Russian Spanish Swedish Tamil Turkish
Soon available from ELRA!
8Language Peculiarities
- ? Prosody Tonal languages like Mandarin
- ? Sound system simple vs very complex systems
- ? Phonotactics simple syllable structure vs
complex clusters - ? Scripts, l-2-s simple 11 mapping vs
pictographs - ? Morphology, Segmentation
- Natural segmentation into units suitable for
LVCSR (English) - Compounds (German)
- Donau-dampf-schiffahrts-gesellschafts-kapitän
- The Captain of the Company that operates the
Steamboats on the Donau River - Word phrases due to morphological structure
(Turkish, Korean) - Osman-l?-laç-t?r-ama-yabil-ecek-ler-imiz-den-mis-s
iniz - behaving as if you were of those whom we might
consider not converting into Ottoman - No segmentation at all (Chinese, Japanese)
9Monolingual Recognizers in 10 Languages
10Language Independent ASR
- Can we build a language independent ASR system?
- Universal (Language Independent) Acoustic
Modeling - Sounds production is human NOT language specific
- International Phonetic Alphabet (IPA) simple
to implement easy to port to new languages - Fully data-driven procedure considers
spectral properties and similarities apply to
context independent and dependent models - Universal Language Modeling
- Combine LMs of languages to allow code switching
- Experiments
- Train language dependent and independent AMLM
- Evaluate in monolingual and multilingual mode
11Language Independent AM
12Language Portability
13Language Portability AM
Model mapping to the target language 1) Map
the multilingual phonemes to Portuguese ones
based on the IPA-scheme 2) Copy the
corresponding acoustic models in order to
initialize Portuguese models Problem Contexts
are language specific, how to apply context
dependent models to a new target
language Solution Adaptation of multilingual
contexts to the target language based on limited
training data
14Language Portability Experiments
15Language Portability Dictionary
16Language Portability Dictionary
Task In each language a pronunciation has to be
generated for each word Problem Hand-crafting
is very expensive Rule-based approach requires
letter-to-sound relationship and linguistic
knowledge Letter-to-Sound relationship
- Solutions Fully automatical dictionary
generation - Apply letter-to-sound rules where possible
- Use phoneme recognizers in different languages
- Learn sound units and dict units from scratch
17Language Portability LM
18Conclusions Todos
- Applications are needed in many languages
- Many applications like S-2-S require Multilingual
Systems - Monolingual ASR
- Language Pecularities Morphology, Segmentation,
Scripts, ... - Language Independent ASR
- AM very useful for independent and adaptive
applications - Potential for non-native / accented speech
- LM universal language model allows code
switching - Language Portability
- Language adaptive AM reduce need for new data
- What needs to be solved, improved, ...
- Dictionary, LMing, task-specific vs large vocab