Title: Current Status and Future Challenges
1Current Status and Future Challenges
- a Speech Technology View
Assoc. Prof. Børge Lindberg Speech and
Multimedia Communication Division (Center for
PersonKommunikation (CPK) is now integrated into
the Department of Communication
Technology) Aalborg University, Denmark E-mail
lindberg_at_kom.auc.dk
2Current status speech recognition
- A Corpus Based Technology
- Statistical models derived from acoustic and
linguistic databases ! - Assisted by a significant Media-development
(storage capacity) and Hardware development
(computational capacity) - Appearance of Data Consortia- Linguistic Data
Consortium (LDC), www.ldc.upenn.edu- European
Language Resources Association (ELRA)
www.icp.grenet.fr/ELRA/home.html - Recently, powerful commercial speech recognition
engines have become available
3Current status speech recognition
- Applications in Danish
- Mainly in the telephony domain (traffic
information, calling rates) Providers Nuance,
Philips (now Scansoft)/PDC - Penetration of call centre automation lower than
in e.g. the US. - Recently, in the office environment, also
specialised domains such as Pathology and
Radiology have become available more
plannedProvider Philips/MaxManus - No computer command control recogniser
available for Danish and no general domain
speech recognition available.Database will be
soon be available from the SPEECON project - The latter two are required for physically
handicapped and for people having reading-,
writing or spelling disabilities - Other players are IBM/NST (ViaVoice), Empathy
Systems and Scansoft (Dragon) - Lack of robustness
4Current status speech synthesis (TTS)
- Like ASR, a corpus based technology
- With less computational demands on the core
engine - Applications are out running on Danish
SMS-to-speech, Phone-browsing, adgangforalle.dk
(access to all), automatic generation of audio
books - Dedicated recorded database for target voice, -
or larger database for trainable synthesis - Three products available (Infovox, RealSpeak and
DanTTS) - Inflexible voice
- Limited prosodic control and limited NLP capacity
5Major results achieved in Denmark
- As a result of public funding
- Text-to-Speech Synthesis is there, to a large
extent because of public funding - Language resources have been developed for speech
recognition (SpeechDat family) these are
available for research, but for a fairly high
price - Automatic Speech Recognition (ASR) is coming up
(main drive is though from industry) - Denmark is lagging behind with respect to ASR
6What went wrong ?
- Sequential development of TTS and ASR TTS first
....This strategy postponed the development of
ASR in Denmark, assisted by commercial players
claiming they were able to do so. - No follow-up public investment in ASR
- Databases are there but hardly accessible
- No Open Source like modules available for core
technologies
7Challenges and what is needed ?
- Development of no business case applications,
mainly for disabled - public involvement, i.e.
investment is needed - Speech Recognisers and Text-to-speech
synthesisers are lacking robustness and a number
of qualities. Public involvement is needed to
ensure continued research in new methods applied
to the Danish language or at least to ensure
Open-Source-like situation for core
technologies - Public involvement in the distribution,
collection, validation, standardisation,
improvement and production of language resources
ideally freely accessible - Without support for the smaller languages, EU
speech technology will develop along two lines
languages that allow full use, and languages that
don't. - In order to benefit from the free flow of
researchers within EU, for that reason alone, we
need to strive for creating and maintaining
attractive research environments which are
facilitating high level research - Research into multi-modality (speech is one out
of more modalities), multi-linguality and in
particular non-native language.