Title: Getting started
1Preparing Future multiSensorial inTerAction
Research
2Outline
- Introduction
- Project objectives
- Participants
- Objectives, Scenarios, Approaches for each WP
- Expected results
- Main steps
3Introduction
Speech to speech translation
Multilingual and Multisensorial Communication
(MMC)
Detection and expressions of emotional states
Core speech technologies for children
PFSTAR intends to contribute to establish future
activities in the field of MMC on firmer bases
by providing technological baselines, comparative
evaluationsand assessment of prospects of core
technologies, which future research
anddevelopment efforts can build from.
4Project objectives
- The project builds on years of research
already conducted in several national and
international research projects (NESPOLE!,
C-STAR, Verbmobil, SmartKom).PFSTAR wants to
improve on, refine, stabilise, and align current
achievements to turn them into true technological
baselines along with careful assessments and
evaluations.
5Project objectives
- The goal of this project is to contribute to
advance research and lay the foundations for
future efforts on the topic of Multilingual and
Multisensorial Communication -
6Participants
Istituto Trentino di Cultura Centro per la
Ricerca Scientifica e Tecnologica (ITC-irst)
Interactive Systems Laboratories at Universitaet
Karlsruhe (UKA)
Institute for Pattern Recognition of
Friedrich-Alexander Universitaet - Erlangen
Nurnberg (UERLN)
Department of Electronic, Electrical Computing
Engineering of the University of Birmingham (UB)
Kungl Tekniska Hogskolan (KTH)
RWTH Computer Science Department
Istituto di Scienze e Tecnologie della
Cognizione, Sezione di Padova Fonetica e
Dialettologia, CNR
7WP2 Technologies for speech translation
Objectives
- Comparative evaluation and integration of
different technological baselines for speech to
speech translation over a range of application
scenarios.
8WP2 Technologies for speech translation
Scenarios
- Human to human interaction (tourism and
traveling domains) - Document translation (open
domain) - Cross-language information retrieval
(open domain)
9WP2 Technologies for speech translation
- - Interlingua-based approaches
- Direct translation approaches (statistical
models) -
10WP3-WP4 Technologies for emotions
- Consideration of the emotional state of both
partners in computer-mediated human-human
communication, to enhance the quality of the
exchange. - Understand how the machine can support
emotionally more adequate exchanges in
Human-Computer Interaction. - Extend attention to observable paralinguistic and
extra-linguistic markers, besides the linguistic
ones.
11WP3-WP4 Technologies for emotions
-
- Two workpackages
- WP3 focusing on speech (analysis/recognition and
synthesis) - WP4 focusing on synthetic faces
-
12WP3 Technologies for emotionsspeech
- Identification, extraction and assessment of
prosodic and other linguistic cues correlated
with, and indicating the expression of emotional
states in speech. -
- Definition and assessment, in conjunction with
WP4, of a technological baseline for believable
expressive agents (talking heads), capable of
communicating emotions through speech and facial
gestures. -
13WP3 Technologies for emotions speech
Scenario
- Analysis automatic dialogue systems
interaction with entertainment robots
human-machine, telephone-based communication. - Synthesis a broad scenario in which
communication is mediated by expressive agents -
14WP3 Technologies for emotions speech
- Use of a large feature vector modelling the
chosen prosodic parameters fundamental frequency
(F0), energy, duration, and pauses - For each relevant emotional phenomena, a separate
classifier will be used, whose output will be a
probability rating. - All probabilities will be weighted (using
automatic optimisation methods), yielding a
single probability for each emotional state. -
15WP3 Technologies for emotions speech
- Correlate syntactic and pragmatic/semantic
parameters to prosodic ones, finding the
combinations that yield better predictions. - These will then be used to build a prosodic model
for each emotional state. - Use of different classifiers, like classification
and regression trees, linear regression
techniques, neural networks, etc., comparing and
integrating their results. -
16WP4 Technologies for emotions synthetic faces
Definition and assessment of a
technological baseline for believable virtual
agents in the form of talking heads, which
produces can communicate emotions by using both
the speech synthesis to be developed in WP3, and
facial gestures.
17WP4 Technologies for emotions synthetic faces
Scenario
Human-computer interactive communication
Spoken dialogue systems 3D animated
agents
18WP4 Technologies for emotions synthetic faces
- Development of a model of predefined prototypical
facial gestures for the relevant subset of basic
emotions. - Based on available and collected data, the
generation models will be augmented to handle the
complex interaction/integration of the linguistic
and extralinguistic signals. The result will be a
set of gesture libraries for controlling the
facial expression of emotions.
19WP5 Speech technologies for children
Objectives
Establish ASR (automatic speech
recognition) baselines for childrens speech in
English, German, Italian and Swedish.
20WP5 Speech technologies for children
Scenario
- Reading tutor
- Interactive learning tools
- Conversational interfaces for children
-
21WP5 Speech technologies for children
- Acoustic feature extraction
- Inter-speaker acoustic variability reduction
through vocal tract length normalization - Acoustic modeling
- Recognition of spontaneous speech spoken by
children
22Expected Results
PFSTAR intend to provide the European RD
community with the technological baselines for
future research and development efforts, with a
strong focus towards achieving the common goal of
bringing a solution to Multilingual and
Multisensorial Communication
23Expected results WP2
Speech translation technologies
- Improvement on current baselines
- Comparison across various application scenarios
of different approaches to contribute to the
definition of new research directions and
specific target applications for each approach.
24Expected results WP3
Technologies for emotions speech (1)
- Baseline results for different parameters
- Recommendations for where to put more intensive
research (classification technology, prosodic
features, linguistic features, and units to be
classified) based on results from realistic data
rather than predefined sentences.
25Expected results WP3
Technologies for emotions speech (2)
- A classification of the different emotion classes
which will be tunable according to a cost
function, so that the overall system performance,
rather than the pure recognition rate, can be
optimised - Assessment of the interplay of different
linguistic parameters in synthesis.
26Expected results WP4
Technologies for emotions synthetic faces
- Definition and assessment of a technological
baseline for believable virtual agents in the
form of talking heads - Collection and annotation of relatively small but
varied database of audiovisual emotional speech
in dialogue situations in the target languages
Italian and Swedish
27Expected results WP5
Speech technologies for children
- Baselines for the involved languages (English,
Italian, Swedish and German), with a significant
increase in recognition rate. - An understanding of the extent of inter-speaker
variability and of intra-speaker variability with
respect to adults. - An assessment of the importance of
children-specific pronunciation dictionaries and
children-specific language models.
28PFSTAR main steps
Final Workshop open to external participation
M4
Final set of results
M3
First set of results
M2
Specifications for technological baselines and
assessment procedures
M1
Time line
m1
m16
m24
After m24