Centro per la Ricerca Scientifica e Tecnologica - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Centro per la Ricerca Scientifica e Tecnologica

Description:

Statistical Translation System ... machine translation, information extraction, dialog systems, ... Good text translation was hard enough to pull off. ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 27
Provided by: guer4
Category:

less

Transcript and Presenter's Notes

Title: Centro per la Ricerca Scientifica e Tecnologica


1
  • Centro per la Ricerca Scientifica e Tecnologica

Spoken language technologies recent advances and
future challenges Gianni Lazzari VIENNA July 26
2
  • Centro per la Ricerca Scientifica e Tecnologica
  • SUMMARY
  • Short introduction on SLT
  • Where are we today ?
  • TC-STAR and RAI projects
  • Outlook for the future

Focus on the use of Spoken Language Technologies
for multilingual transcription and reporting
tasks
3
Typical tasks in Human Language Technologies
(HLT)
  • speech recognition (voice commands speech
    transcription)
  • character recognition
  • object and gesture recognition
  • (spoken and written) language understanding
  • spoken dialog systems
  • speech synthesis
  • text summarization
  • document classification and information retrieval
  • syntactic analysis of natural language
  • speech and text translation
  • ...

4
General Spoken Language System Architecture
MODELS acoustic language semantic
dialog synthesis
Recognition
input
Understanding and dialog
answer
Generation and Synthesis
5
Speech Transcription System Architecture
MODELS Acoustic Language Speakers Speech Music
Noise
Recognition
Input Audio -Noise -Speech
-Music -..
results Enriched Text
6
Typical Transcription System
7
Standard Automatic Speech Recognition
Architecture
8
Word error rate of different speech recognition
tasks
  • Dictation 7, well formed,
    computer, FBW
  • Broadcast news 12, various,
    audience, FBW
  • Switchboard 20-30 spontaneous, person,
    TBW
  • Voicemail 30 spontaneous, person,
    TWB
  • Meetings 50-60 spontaneous,
    person FF
  • The features characterizing these tasks are
  • type of speech well formed vs spontaneous
  • target of communication computer, audience,
    person
  • bandwidth
  • FWB, full bandwidth
  • TWB, telephone bandwidth
  • FF, far field.

9
RAI Italian Broadcast news Transcription
10
Evaluation of the Italian broadcast news
transcription task.
  • Acoustic models are trained through a speaker
    adaptive acoustic modelling procedures
  • Two sets of acoustic models were trained, for
    wideband and narrowband speech exploiting for
    each set about 140 hours of speech.
  • The LM was estimated on a 226M-word corpus
    including newspaper articles, for the largest
    part, and BN transcripts.
  • The LM is compiled into a static network with a
    shared-tail topology..

11
Word error rate on the Italian broadcast news
transcription task.
12
STATISTICAL TRANSLATION BASED ON BAYESIAN
DECISION RULE
13
Statistical Translation System
14
Research Cycle
Basic Research
Technology Development
Application Development
Usage Evaluation
Research needed for improving technology
Quantitative Evaluation
Technologies needed for applications
Bottleneck Identification
Research results in quantitative evaluation
Technologies validated for applications.
Long term High risk Large ROI
Usability Acceptability
Evolutionary
15
(No Transcript)
16
Experimental findings in HLT research (1973-2004)
  • statistical methods most successful
  • in particular speech recognition, language
    translation, parsing, dialog systems, ...
  • scientific foundations
  • methods of computer science, statistical
    modelling, information theory
  • handling huge amounts of data
  • 200 hours of speech recordings, 100 Mio of
    running words, ...
  • learning from data
  • fully automatic procedures
  • more data than can be processed by human experts
  • efficient algorithms
  • search/decision algorithms for heuristic search
  • ...

17
Research on HLT 1973-2004
  • speech recognition (1973-2004)
  • most of the progress by pure statistical
    modelling
  • some progress by weak acoustic-phonetic-linguisti
    c knowledge,i.e. domain specific knowledge
  • virtually no progress by classical rule-based
    and AI methods
  • similar recent experience (1993-2004)
  • machine translation, information extraction,
    dialog systems, ...
  • expectation for future progress in HLT
  • most important methodology
  • computer science, statistical modelling,
    information theory
  • domain-specific knowledge
  • acoustics, phonetics, linguistics, ...

18
Spoken language translation joint projects
(national, European, international ATR, C-Star,
Verbmobil, Eutrans, Nespole!, Fame, LC-Star,
PF-Star, TC-STAR
  • restricted domains
  • appointment scheduling, conference registration,
    travelling, tourism information, ...
  • vocabulary size 3 000 10 000 words
  • best performing systems and approaches
    data-driven
  • example-based methods
  • finite-state transducers
  • statistical approaches
  • e.g. Verbmobil evaluation June 2000 better
    by a factor of 2
  • written language translation US Tides project
    2001-2004
  • unrestricted domain press news, vocab.size 50
    000 words
  • language pairs Chinese!English, Arabic!English
  • performance July 2003
  • best statistical systems are better than
    conventional/commercial systems

19
TC-STARTechnology and Corpora for Speech to
Speech Translation
VI FRAMEWORK PROGRAM PRIORITY Multimodal
Interfaces IST-2002-2.3.1.6
Contract Nr. FP6 506738
20
PARTNERS
21
TC-STAR
  • TC-STAR Project focuses on advanced
    research in
  • key technologies for speech to
    speech translation
  • speech recognition (ASR)
  • spoken language translation (SLT)
  • speech synthesis (TTS)
  • Start April 2004
  • End March 2007
  • Grant 11 M. Euro
  • METHODOLOGY
  • COMPETITIVE EVALUATION
  • COOPERATION

22
Vision
  • Transcription and Translation of broadcast news,
    speeches, lectures and interviews

Hi, What do you think about
Simultaneous Translation
Vocal access
Web access
23
Application Scenario
  • A selection of unconstrained conversational
    speech domains
  • - Broadcast news
  • - European Parliament
  • Plenary Session
  • A few languages important for Europe society and
    economy
  • European Accented English
  • European Spanish
  • Chinese

24
2005 FIRST EVALUATION RESULTS ON THE EUROPEAN
PARLIAMENT PLENARY SESSION TASK
  • The Evaluation Tasks and Databases translation
    tasks
  • English to Spanish
  • EPPS European Parliament Plenary Sessions
  • Spanish to English
  • EPPS European Parliament Plenary Session
  • Three types of input to SLT
  • output of automatic speech recognition
  • verbatim manual transcriptions
  • final text editions (with punctuation marks)

25
2005 FIRST EVALUATION RESULTS ON THE EUROPEAN
PARLIAMENT PLENARY SESSION TASK
  • Training data
  • Sentence-aligned speeches and
    their translations
  • Final text editions from
    April 1996 to Oct. 4th, 2004
  • Verbatim transcriptions from
    May 2004 to Oct. 4th, 2004
  • Development data Oct. 26, 2004
  • Evaluation data Nov. 14, 2004

26
2005 FIRST EVALUATION RESULTS ON THE EUROPEAN
PARLIAMENT PLENARY SESSION TASK
27
2005 FIRST EVALUATION RESULTS ON THE EUROPEAN
PARLIAMENT PLENARY SESSION TASK
  • ASR EPPS DATA
    word error rate - wer
  • EUROEPAN ACCENTED ENGLISH 9,5 best
    TC-STAR
  • EUROPEAN SPANISH 10,1 best TC-STAR
  • SLT EPPS DATA
    position independent - wer
  • ENGLISH TO SPANISH 49
    best PARTNER result
  • SPANISH TO ENGLISH 46
    best PARTNER result

28
The spoken translation problem .is still a
significant challenge Good text translation
was hard enough to pull off. Speech to speech MT
was beyond going to the Moon it was Mars
Steve Silbermann, Wired Magazine.
Write a Comment
User Comments (0)
About PowerShow.com