Getting started - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Getting started

Description:

... pragmatic/semantic parameters to prosodic ones, finding the combinations that ... These will then be used to build a prosodic model for each emotional state. ... – PowerPoint PPT presentation

Number of Views:54

Avg rating:3.0/5.0

Slides: 29

Provided by: Anna170

Category:

more less

Transcript and Presenter's Notes

Title: Getting started

1
Preparing Future multiSensorial inTerAction
Research
2
Outline

Introduction
Project objectives
Participants
Objectives, Scenarios, Approaches for each WP
Expected results
Main steps

3
Introduction

Speech to speech translation
Multilingual and Multisensorial Communication
(MMC)
Detection and expressions of emotional states
Core speech technologies for children
PFSTAR intends to contribute to establish future
activities in the field of MMC on firmer bases
by providing technological baselines, comparative
evaluationsand assessment of prospects of core
technologies, which future research
anddevelopment efforts can build from.
4
Project objectives

The project builds on years of research
already conducted in several national and
international research projects (NESPOLE!,
C-STAR, Verbmobil, SmartKom).PFSTAR wants to
improve on, refine, stabilise, and align current
achievements to turn them into true technological
baselines along with careful assessments and
evaluations.

5
Project objectives

The goal of this project is to contribute to
advance research and lay the foundations for
future efforts on the topic of Multilingual and
Multisensorial Communication

6
Participants
Istituto Trentino di Cultura Centro per la
Ricerca Scientifica e Tecnologica (ITC-irst)
Interactive Systems Laboratories at Universitaet
Karlsruhe (UKA)
Institute for Pattern Recognition of
Friedrich-Alexander Universitaet - Erlangen
Nurnberg (UERLN)
Department of Electronic, Electrical Computing
Engineering of the University of Birmingham (UB)
Kungl Tekniska Hogskolan (KTH)
RWTH Computer Science Department
Istituto di Scienze e Tecnologie della
Cognizione, Sezione di Padova Fonetica e
Dialettologia, CNR
7
WP2 Technologies for speech translation
Objectives

Comparative evaluation and integration of
different technological baselines for speech to
speech translation over a range of application
scenarios.

8
WP2 Technologies for speech translation
Scenarios
- Human to human interaction (tourism and
traveling domains) - Document translation (open
domain) - Cross-language information retrieval
(open domain)
9
WP2 Technologies for speech translation

Approaches

- Interlingua-based approaches
Direct translation approaches (statistical
models)

10
WP3-WP4 Technologies for emotions

Common Objectives

Consideration of the emotional state of both
partners in computer-mediated human-human
communication, to enhance the quality of the
exchange.
Understand how the machine can support
emotionally more adequate exchanges in
Human-Computer Interaction.
Extend attention to observable paralinguistic and
extra-linguistic markers, besides the linguistic
ones.

11
WP3-WP4 Technologies for emotions

Approach

Two workpackages
WP3 focusing on speech (analysis/recognition and
synthesis)
WP4 focusing on synthetic faces

12
WP3 Technologies for emotionsspeech

Objectives

Identification, extraction and assessment of
prosodic and other linguistic cues correlated
with, and indicating the expression of emotional
states in speech.
Definition and assessment, in conjunction with
WP4, of a technological baseline for believable
expressive agents (talking heads), capable of
communicating emotions through speech and facial
gestures.

13
WP3 Technologies for emotions speech
Scenario

Analysis automatic dialogue systems
interaction with entertainment robots
human-machine, telephone-based communication.
Synthesis a broad scenario in which
communication is mediated by expressive agents

14
WP3 Technologies for emotions speech

Approaches -analysis

Use of a large feature vector modelling the
chosen prosodic parameters fundamental frequency
(F0), energy, duration, and pauses
For each relevant emotional phenomena, a separate
classifier will be used, whose output will be a
probability rating.
All probabilities will be weighted (using
automatic optimisation methods), yielding a
single probability for each emotional state.

15
WP3 Technologies for emotions speech

Approaches -synthesis

Correlate syntactic and pragmatic/semantic
parameters to prosodic ones, finding the
combinations that yield better predictions.
These will then be used to build a prosodic model
for each emotional state.
Use of different classifiers, like classification
and regression trees, linear regression
techniques, neural networks, etc., comparing and
integrating their results.

16
WP4 Technologies for emotions synthetic faces

Objectives

Definition and assessment of a
technological baseline for believable virtual
agents in the form of talking heads, which
produces can communicate emotions by using both
the speech synthesis to be developed in WP3, and
facial gestures.
17
WP4 Technologies for emotions synthetic faces
Scenario
Human-computer interactive communication
Spoken dialogue systems 3D animated
agents
18
WP4 Technologies for emotions synthetic faces

Approaches

Development of a model of predefined prototypical
facial gestures for the relevant subset of basic
emotions.
Based on available and collected data, the
generation models will be augmented to handle the
complex interaction/integration of the linguistic
and extralinguistic signals. The result will be a
set of gesture libraries for controlling the
facial expression of emotions.

19
WP5 Speech technologies for children
Objectives
Establish ASR (automatic speech
recognition) baselines for childrens speech in
English, German, Italian and Swedish.
20
WP5 Speech technologies for children
Scenario

Reading tutor
Interactive learning tools
Conversational interfaces for children

21
WP5 Speech technologies for children

Approaches

Acoustic feature extraction
Inter-speaker acoustic variability reduction
through vocal tract length normalization
Acoustic modeling
Recognition of spontaneous speech spoken by
children

22
Expected Results
PFSTAR intend to provide the European RD
community with the technological baselines for
future research and development efforts, with a
strong focus towards achieving the common goal of
bringing a solution to Multilingual and
Multisensorial Communication
23
Expected results WP2
Speech translation technologies

Improvement on current baselines
Comparison across various application scenarios
of different approaches to contribute to the
definition of new research directions and
specific target applications for each approach.

24
Expected results WP3
Technologies for emotions speech (1)

Baseline results for different parameters
Recommendations for where to put more intensive
research (classification technology, prosodic
features, linguistic features, and units to be
classified) based on results from realistic data
rather than predefined sentences.

25
Expected results WP3
Technologies for emotions speech (2)

A classification of the different emotion classes
which will be tunable according to a cost
function, so that the overall system performance,
rather than the pure recognition rate, can be
optimised
Assessment of the interplay of different
linguistic parameters in synthesis.

26
Expected results WP4
Technologies for emotions synthetic faces

Definition and assessment of a technological
baseline for believable virtual agents in the
form of talking heads
Collection and annotation of relatively small but
varied database of audiovisual emotional speech
in dialogue situations in the target languages
Italian and Swedish

27
Expected results WP5
Speech technologies for children

Baselines for the involved languages (English,
Italian, Swedish and German), with a significant
increase in recognition rate.
An understanding of the extent of inter-speaker
variability and of intra-speaker variability with
respect to adults.
An assessment of the importance of
children-specific pronunciation dictionaries and
children-specific language models.

28
PFSTAR main steps
Final Workshop open to external participation
M4
Final set of results
M3
First set of results
M2
Specifications for technological baselines and
assessment procedures
M1
Time line
m1
m16
m24
After m24

Write a Comment

User Comments (0)