Kein%20Folientitel

About This Presentation

Title:

Kein%20Folientitel

Description:

Dagstuhl Seminar Coordination and Fusion in Multimodal Interaction Media Coordination in SmartKom Norbert Reithinger Deutsches Forschungszentrum f r ... – PowerPoint PPT presentation

Number of Views:151

Avg rating:3.0/5.0

Slides: 22

Provided by: Norbe99

Category:

more less

Transcript and Presenter's Notes

Title: Kein%20Folientitel

1
Dagstuhl Seminar Coordination and Fusion in
Multimodal Interaction
Media Coordination in SmartKom
Norbert Reithinger
Deutsches Forschungszentrum für Künstliche
Intelligenz GmbH Stuhlsatzenhausweg 3, Geb. 43.1
- 66123 Saarbrücken Tel. (0681) 302-5346 Email
bert_at_dfki.de www.smartkom.org www.dfki.de/bert
2
Overview

Situated Delegation-oriented Dialog Paradigm
More About the System Software
Media Coordination Issues
Media Processing The Data Flow
Processing the Users State
Media Fusion
Media Design
Conclusion

3
The SmartKom Consortium
Project Budget 25.5 million Project Duration
4 years (September 1999 September 2003)
Main Contractor DFKI Saarbrücken
MediaInterface
Saarbrücken
Berkeley
European Media Lab
Dresden
Uinv. Of Munich
Univ. of Stuttgart
Heidelberg
Munich
Univ. of Erlangen
Stuttgart
Ulm
Aachen
4
Situated Delegation-oriented Dialog Paradigm
IT Services
Service 1
Personalized Interaction Agent
User
specifies goal
delegates task
Service 2
cooperate on problems
asks questions
Service 3
presents results
5
More About the System
6
More About the System

Modules realized as independent processes
Not all must be there (critical path speech or
graphic input to speech or graphic output)
(Mostly) independent from display size
Pool Communication Architecture (PCA) based on
PVM for Linux and NT
Modules know about their I/O pools
Literature
Andreas Klüter, Alassane Ndiaye, Heinz
KirchmannVerbmobil From a Software Engineering
Point of View System Design and Software
Integration. In Wolfgang Wahlster Verbmobil -
Foundation of Speech-To-Speech Translation.
Springer, 2000.
Data exchanged using M3L documents C\Documents
and Settings\bert\Desktop\SmartKom-Systeminfo\inde
x.html
All modules and pools are visualized here ...

7
(No Transcript)
8
Media Coordination Issues

Input
Speech
Words
Prosody boundaries, stress, emotion
Mimics neutral, anger
Gesture
Touch free (scenario public)
Touch sensitive screen
Output
Display objects
Speech
Agent posture, gesture, lip movement

9
Media Processing The Data Flow
User State
Domain Information
System State
Speech
Speech Agents Posture and Behaviour
Mimics (Neutral or Anger)
Gesture
Display Objects with ref ID and Location
Prosody (emotion)
Presentation (Media Design)
Media Fusion
Interaction Modeling
Dialog-Core
10
The Input/Output Modules
11
Processing the Users State
12
Processing the Users State

User state neutral and anger
Recognized using mimics and prosody
In case of anger activate the dynamic help in the
Dialog Core Engine
Elmar Nöth will hopefully tell you more about
this in his talk Modeling the User State - The
Role of Emotions

13
Media Fusion
14
Gesture Processing

Objects on the screen are tagged with IDs
Gesture input
Natural gestures recognized by SIVIT
Touch sensitive screen
Gesture recognition
Location
Type of gesture pointing, tarrying, encircling
Gesture Analysis
Reference object in the display described as XML
domain model (sub-)objects (M3L schemata)
Bounding box
Output gesture lattice with hypotheses

15
Speech Processing

Speech Recognizer produces word lattice
Prosody inserts boundary and stress information
Speech analysis creates intention hypotheses with
markers for deictic expressions

16
Media Fusion

Integrates gesture hypotheses in the intention
hypotheses of speech analysis
Information restriction possible from both media
Possible but not necessary correspondence of
gestures and placeholders (deictic expressions/
anaphora) in the intention hypothesis
Necessary Time coordination of gesture and
speech information
Time stamps in ALL M3L documents!!
Output sequence of intention hypothesis

17
Media Design (Media Fission)
18
Media Design

Starts with action planning
Definition of an abstract presentation goal
Presentation planner
Selects presentation, style, media, and agents
general behaviour
Activates natural language generator which
activates the speech synthesis which returns
audio data and time-stamped phoneme/viseme
sequence
Character Animation realizes the agents
behaviour
Synchronized presentation of audio and visual
information

19
Lip Synchronization with Visemes

Goal present a speech prompt as natural as
possible
Viseme elementary lip positions
Correspondence of visemes and phonemes
Examples

20
Behavioural Schemata

Goal Smartakus is always active to signal the
state of the system
Four main states
Wait for users input
Users input
Processing
System presentation
Current body movements
9 vital, 2 processing, 9 presentation (5
pointing, 2 movements, 2 face/mouth)
About 60 basic movements

21
Conclusion

Three implemented systems (Public, Home, Mobile)
Media coordination implemented
Backbone uses declarative knowledge sources and
is rather flexible
Lots remains to be done
Robustness
Complex speech expressions
Complex gestures (shape and timing)
Implementation of all user states
....
Reuse of modules in other contexts, e.g. in MIAMM

Write a Comment

User Comments (0)

About PowerShow.com

Kein%20Folientitel - PowerPoint PPT Presentation

Kein%20Folientitel

Dagstuhl Seminar Coordination and Fusion in Multimodal Interaction Media Coordination in SmartKom Norbert Reithinger Deutsches Forschungszentrum f r ... – PowerPoint PPT presentation