Title: Aucun titre de diapositive
1From Vocal to Multimodal Dialogue Management
Agnes Lisowska ISSCO/TIM/ETI, University of
Geneva, Switzerland
Miroslav Melichar, Marita Ailomaa, Martin
Rajman LIA/CGC, EPFL, Lausanne, Switzerland
Pavel Cenek Masaryk University in Brno, Czech
Republic
Our main idea Adapt our existing vocal dialogue
management system to be able to cope with
multimodal input
Multimodal interactive system Archivus
- Access to recorded and annotated meeting data
- Answers questions like What were Johns
questions related to the budget in the meeting in
April? - Modalities
- Input speech, text, mouse and tactile screen
- Output speech, text, graphics, video
- System features
- Interaction is controlled by a (multimodal)
dialogue manager. - Implemented using our SW toolkit which supports
Wizard of Oz experimenting as an integral part of
system development. - Hidden human operators (wizards) help to
interpret multimodal user input and to adjust
system output when necessary.
Why A Multimodal Interactive Dialogue System? In
comparison to voice-only systems, it -
increases robustness and flexibility for user
input (several input modalities possible) -
increases users understanding of the interaction
context (screen provides additional feedback)
Adapting the vocal dialogue system for multimodal
input
Vocal dialogue management Frame based - a frame
with hierarchical slot structure Generic Dialogue
Node (GDN) - specifies interaction needed to
obtain valid values for associated slots (defines
current question under discussion) Dialogue
strategies local (within GDN) global
(navigation between GDNs, dialogue planning)
- Multimodal dialogue management
- GDN extended to Multimodal GDN (mGDN)
- an mGDN is associated with a graphical component
and contains local dialogue strategies for
multimodal interaction (in addition to grammars
and prompts) - The interaction management strategies had to
undergo several modifications user behavior is
different when compared to voice only interaction
- New role of system prompts
- Vocal dialogue system prompts inform the user
about information required by the system. - Multimodal dialogue system requests for
information needed from the user are provided
graphically and are often redundant (because it
is the user who decides what information to
provide to the system and knows the dialogue
context). - Prompts have a new function they provide advice
to the user and foster interaction in natural
language - the advice typically concerns several elements
on the screen, not only one GDN in isolation. - examples All books satisfy your search
criteria, You can access the document through
the book - such prompts are difficult to predict and their
triggering conditions are hard to define. - - we used a wizard to optionally modify the
default prompts issued by system. - - 18 of the prompts were changed during the
experiments.
- Dialogue strategies are more
- user-driven in our multimodal system!
- Vocal dialogue system the user expresses some
initial wishes at the beginning of the
interaction and the system progressively asks for
missing information, guiding the user towards the
goal of the interaction. - guiding the user is important, as he may not
know 1) what information the system is able to
process and 2) what information helps them
optimally progress within the dialogue at a given
time. - Multimodal dialogue system users prefer to
participate more actively in the interaction
because they have a better understanding of the
current context of the interaction - due to screen output, users can easily see e.g.
what types of information the system requires a
partial view of the current search space how to
solve an over-constrained situation, etc. - Less control over the interaction is required
from a multimodal system (users do not
necessarily want to follow systems suggestions) - the strategy for selecting the next dialogue
focus (GDN) was made more passive after
obtaining a value for an mGDN, the multimodal
system only goes up in the GDN hierarchy instead
of selecting the GDN associated with the next
piece of missing information. - the focus of the dialogue can be changed by the
user by selecting the appropriate part of the
graphical interface.
Experimental results (preliminary)
- Conclusions
- Adding new modalities (screen) to vocal dialogue
systems is possible, but it substantially changes
the way a user interacts with the system. - Though such a system resembles a traditional
GUI, natural language is perceived as useful and
used in a number of specific situations. - Results suggest that simply augmenting a GUI
with spoken commands for navigation is not
appreciated by users speech is mostly used to
provide search criteria (and control invisible
GUI elements), while mouse is preferred for
navigation within the interface.
Is modality selection random?
How often were language modalities used?
- Language represents an important fraction of the
interaction!
- Language is used to provide search criteria,
mouse to navigate.