Title: Centro per la Ricerca Scientifica e Tecnologica
1- Centro per la Ricerca Scientifica e Tecnologica
Spoken language technologies recent advances and
future challenges Gianni Lazzari VIENNA July 26
2- Centro per la Ricerca Scientifica e Tecnologica
- SUMMARY
-
- Short introduction on SLT
- Where are we today ?
- TC-STAR and RAI projects
- Outlook for the future
Focus on the use of Spoken Language Technologies
for multilingual transcription and reporting
tasks
3Typical tasks in Human Language Technologies
(HLT)
- speech recognition (voice commands speech
transcription) - character recognition
- object and gesture recognition
- (spoken and written) language understanding
- spoken dialog systems
- speech synthesis
- text summarization
- document classification and information retrieval
- syntactic analysis of natural language
- speech and text translation
- ...
4General Spoken Language System Architecture
MODELS acoustic language semantic
dialog synthesis
Recognition
input
Understanding and dialog
answer
Generation and Synthesis
5Speech Transcription System Architecture
MODELS Acoustic Language Speakers Speech Music
Noise
Recognition
Input Audio -Noise -Speech
-Music -..
results Enriched Text
6Typical Transcription System
7Standard Automatic Speech Recognition
Architecture
8Word error rate of different speech recognition
tasks
- Dictation 7, well formed,
computer, FBW - Broadcast news 12, various,
audience, FBW - Switchboard 20-30 spontaneous, person,
TBW - Voicemail 30 spontaneous, person,
TWB - Meetings 50-60 spontaneous,
person FF - The features characterizing these tasks are
- type of speech well formed vs spontaneous
- target of communication computer, audience,
person - bandwidth
- FWB, full bandwidth
- TWB, telephone bandwidth
- FF, far field.
9RAI Italian Broadcast news Transcription
10Evaluation of the Italian broadcast news
transcription task.
- Acoustic models are trained through a speaker
adaptive acoustic modelling procedures - Two sets of acoustic models were trained, for
wideband and narrowband speech exploiting for
each set about 140 hours of speech. - The LM was estimated on a 226M-word corpus
including newspaper articles, for the largest
part, and BN transcripts. - The LM is compiled into a static network with a
shared-tail topology..
11Word error rate on the Italian broadcast news
transcription task.
12STATISTICAL TRANSLATION BASED ON BAYESIAN
DECISION RULE
13Statistical Translation System
14 Research Cycle
Basic Research
Technology Development
Application Development
Usage Evaluation
Research needed for improving technology
Quantitative Evaluation
Technologies needed for applications
Bottleneck Identification
Research results in quantitative evaluation
Technologies validated for applications.
Long term High risk Large ROI
Usability Acceptability
Evolutionary
15(No Transcript)
16Experimental findings in HLT research (1973-2004)
- statistical methods most successful
- in particular speech recognition, language
translation, parsing, dialog systems, ... - scientific foundations
- methods of computer science, statistical
modelling, information theory - handling huge amounts of data
- 200 hours of speech recordings, 100 Mio of
running words, ... - learning from data
- fully automatic procedures
- more data than can be processed by human experts
- efficient algorithms
- search/decision algorithms for heuristic search
- ...
17Research on HLT 1973-2004
- speech recognition (1973-2004)
- most of the progress by pure statistical
modelling - some progress by weak acoustic-phonetic-linguisti
c knowledge,i.e. domain specific knowledge - virtually no progress by classical rule-based
and AI methods - similar recent experience (1993-2004)
- machine translation, information extraction,
dialog systems, ... - expectation for future progress in HLT
- most important methodology
- computer science, statistical modelling,
information theory - domain-specific knowledge
- acoustics, phonetics, linguistics, ...
18Spoken language translation joint projects
(national, European, international ATR, C-Star,
Verbmobil, Eutrans, Nespole!, Fame, LC-Star,
PF-Star, TC-STAR
- restricted domains
- appointment scheduling, conference registration,
travelling, tourism information, ... - vocabulary size 3 000 10 000 words
- best performing systems and approaches
data-driven - example-based methods
- finite-state transducers
- statistical approaches
- e.g. Verbmobil evaluation June 2000 better
by a factor of 2 - written language translation US Tides project
2001-2004 - unrestricted domain press news, vocab.size 50
000 words - language pairs Chinese!English, Arabic!English
- performance July 2003
- best statistical systems are better than
conventional/commercial systems
19TC-STARTechnology and Corpora for Speech to
Speech Translation
VI FRAMEWORK PROGRAM PRIORITY Multimodal
Interfaces IST-2002-2.3.1.6
Contract Nr. FP6 506738
20PARTNERS
21TC-STAR
- TC-STAR Project focuses on advanced
research in - key technologies for speech to
speech translation - speech recognition (ASR)
- spoken language translation (SLT)
- speech synthesis (TTS)
- Start April 2004
- End March 2007
- Grant 11 M. Euro
- METHODOLOGY
- COMPETITIVE EVALUATION
- COOPERATION
22Vision
- Transcription and Translation of broadcast news,
speeches, lectures and interviews
Hi, What do you think about
Simultaneous Translation
Vocal access
Web access
23Application Scenario
- A selection of unconstrained conversational
speech domains - - Broadcast news
- - European Parliament
- Plenary Session
- A few languages important for Europe society and
economy - European Accented English
- European Spanish
- Chinese
242005 FIRST EVALUATION RESULTS ON THE EUROPEAN
PARLIAMENT PLENARY SESSION TASK
- The Evaluation Tasks and Databases translation
tasks - English to Spanish
- EPPS European Parliament Plenary Sessions
- Spanish to English
- EPPS European Parliament Plenary Session
- Three types of input to SLT
- output of automatic speech recognition
- verbatim manual transcriptions
- final text editions (with punctuation marks)
252005 FIRST EVALUATION RESULTS ON THE EUROPEAN
PARLIAMENT PLENARY SESSION TASK
- Training data
- Sentence-aligned speeches and
their translations - Final text editions from
April 1996 to Oct. 4th, 2004 - Verbatim transcriptions from
May 2004 to Oct. 4th, 2004 - Development data Oct. 26, 2004
- Evaluation data Nov. 14, 2004
262005 FIRST EVALUATION RESULTS ON THE EUROPEAN
PARLIAMENT PLENARY SESSION TASK
272005 FIRST EVALUATION RESULTS ON THE EUROPEAN
PARLIAMENT PLENARY SESSION TASK
- ASR EPPS DATA
word error rate - wer - EUROEPAN ACCENTED ENGLISH 9,5 best
TC-STAR - EUROPEAN SPANISH 10,1 best TC-STAR
- SLT EPPS DATA
position independent - wer - ENGLISH TO SPANISH 49
best PARTNER result - SPANISH TO ENGLISH 46
best PARTNER result
28 The spoken translation problem .is still a
significant challenge Good text translation
was hard enough to pull off. Speech to speech MT
was beyond going to the Moon it was Mars
Steve Silbermann, Wired Magazine.