Kein%20Folientitel - PowerPoint PPT Presentation

About This Presentation
Title:

Kein%20Folientitel

Description:

Dagstuhl Seminar Coordination and Fusion in Multimodal Interaction Media Coordination in SmartKom Norbert Reithinger Deutsches Forschungszentrum f r ... – PowerPoint PPT presentation

Number of Views:144
Avg rating:3.0/5.0
Slides: 22
Provided by: Norbe99
Category:

less

Transcript and Presenter's Notes

Title: Kein%20Folientitel


1
Dagstuhl Seminar Coordination and Fusion in
Multimodal Interaction  
Media Coordination in SmartKom
Norbert Reithinger
Deutsches Forschungszentrum für Künstliche
Intelligenz GmbH Stuhlsatzenhausweg 3, Geb. 43.1
- 66123 Saarbrücken Tel. (0681) 302-5346 Email
bert_at_dfki.de www.smartkom.org www.dfki.de/bert
2
Overview
  • Situated Delegation-oriented Dialog Paradigm
  • More About the System Software
  • Media Coordination Issues
  • Media Processing The Data Flow
  • Processing the Users State
  • Media Fusion
  • Media Design
  • Conclusion

3
The SmartKom Consortium
Project Budget 25.5 million Project Duration
4 years (September 1999 September 2003)
Main Contractor DFKI Saarbrücken
MediaInterface
Saarbrücken
Berkeley
European Media Lab
Dresden
Uinv. Of Munich
Univ. of Stuttgart
Heidelberg
Munich
Univ. of Erlangen
Stuttgart
Ulm
Aachen
4
Situated Delegation-oriented Dialog Paradigm
IT Services
Service 1
Personalized Interaction Agent
User
specifies goal
delegates task
Service 2
cooperate on problems
asks questions
Service 3
presents results
5
More About the System
6
More About the System
  • Modules realized as independent processes
  • Not all must be there (critical path speech or
    graphic input to speech or graphic output)
  • (Mostly) independent from display size
  • Pool Communication Architecture (PCA) based on
    PVM for Linux and NT
  • Modules know about their I/O pools
  • Literature
  • Andreas Klüter, Alassane Ndiaye, Heinz
    KirchmannVerbmobil From a Software Engineering
    Point of View System Design and Software
    Integration. In Wolfgang Wahlster Verbmobil -
    Foundation of Speech-To-Speech Translation.
    Springer, 2000.
  • Data exchanged using M3L documents C\Documents
    and Settings\bert\Desktop\SmartKom-Systeminfo\inde
    x.html
  • All modules and pools are visualized here ...

7
(No Transcript)
8
Media Coordination Issues
  • Input
  • Speech
  • Words
  • Prosody boundaries, stress, emotion
  • Mimics neutral, anger
  • Gesture
  • Touch free (scenario public)
  • Touch sensitive screen
  • Output
  • Display objects
  • Speech
  • Agent posture, gesture, lip movement

9
Media Processing The Data Flow
User State
Domain Information
System State
Speech
Speech Agents Posture and Behaviour
Mimics (Neutral or Anger)
Gesture
Display Objects with ref ID and Location
Prosody (emotion)
Presentation (Media Design)
Media Fusion
Interaction Modeling
Dialog-Core
10
The Input/Output Modules
11
Processing the Users State
12
Processing the Users State
  • User state neutral and anger
  • Recognized using mimics and prosody
  • In case of anger activate the dynamic help in the
    Dialog Core Engine
  • Elmar Nöth will hopefully tell you more about
    this in his talk Modeling the User State - The
    Role of Emotions

13
Media Fusion
14
Gesture Processing
  • Objects on the screen are tagged with IDs
  • Gesture input
  • Natural gestures recognized by SIVIT
  • Touch sensitive screen
  • Gesture recognition
  • Location
  • Type of gesture pointing, tarrying, encircling
  • Gesture Analysis
  • Reference object in the display described as XML
    domain model (sub-)objects (M3L schemata)
  • Bounding box
  • Output gesture lattice with hypotheses

15
Speech Processing
  • Speech Recognizer produces word lattice
  • Prosody inserts boundary and stress information
  • Speech analysis creates intention hypotheses with
    markers for deictic expressions

16
Media Fusion
  • Integrates gesture hypotheses in the intention
    hypotheses of speech analysis
  • Information restriction possible from both media
  • Possible but not necessary correspondence of
    gestures and placeholders (deictic expressions/
    anaphora) in the intention hypothesis
  • Necessary Time coordination of gesture and
    speech information
  • Time stamps in ALL M3L documents!!
  • Output sequence of intention hypothesis

17
Media Design (Media Fission)
18
Media Design
  • Starts with action planning
  • Definition of an abstract presentation goal
  • Presentation planner
  • Selects presentation, style, media, and agents
    general behaviour
  • Activates natural language generator which
    activates the speech synthesis which returns
    audio data and time-stamped phoneme/viseme
    sequence
  • Character Animation realizes the agents
    behaviour
  • Synchronized presentation of audio and visual
    information

19
Lip Synchronization with Visemes
  • Goal present a speech prompt as natural as
    possible
  • Viseme elementary lip positions
  • Correspondence of visemes and phonemes
  • Examples

20
Behavioural Schemata
  • Goal Smartakus is always active to signal the
    state of the system
  • Four main states
  • Wait for users input
  • Users input
  • Processing
  • System presentation
  • Current body movements
  • 9 vital, 2 processing, 9 presentation (5
    pointing, 2 movements, 2 face/mouth)
  • About 60 basic movements

21
Conclusion
  • Three implemented systems (Public, Home, Mobile)
  • Media coordination implemented
  • Backbone uses declarative knowledge sources and
    is rather flexible
  • Lots remains to be done
  • Robustness
  • Complex speech expressions
  • Complex gestures (shape and timing)
  • Implementation of all user states
  • ....
  • Reuse of modules in other contexts, e.g. in MIAMM
Write a Comment
User Comments (0)
About PowerShow.com