Title: Spracovanie a prenos audiosignlov Nvrh interaktvnych recovch komunikacnch systmov
1Spracovanie a prenos audiosignálovNávrh
interaktívnych recových komunikacných systémov
- doc. Ing. Jozef Juhár, CSc.
2Obsah
- Návrh dialógu
- Riadenie dialógu
- 3.Natural Dialogues
- 4.Simulated Studies (Wizard-of-Oz)
3Dialogue Design
- Technology dominates In many cases,
communication is not based on the best possible
solutions, but instead the technology limits
choices and even dictates the design Mane et
al., 1996. - Many technical limitations can be compensated
with properly designed speech interface Kamm,
1994 dialogue!!!. - Therefore, speech interface design may have a
great impact on overall system quality.
4Conversation Techniques
- Conversation Design
- Dialogue strategy system initiative, user
initiative, mixed initiative - Turn taking users have a lot of learned skills
from human-to-human communication - Prompts choosing the right words, length,
guidance etc. - Confirmations
- explicit confirmations heavy, require always
user actions - implicit confirmations light, user actions
avoidable
5Error Handling and Help
- Error correction
- Preventing the user from making errors
- Detection of errors
- Finding out the causes of the errors
- Planning of error correction
- Error correction
- Feedback and help
- very important in speech-only interfaces
6Design Process
- Iterative process
- designed interfaces do not work we are missing
guidelines, users behave differently than
expected etc. gt need for empirical data - Three sources of information
- human to human communication (natural dialogues),
- simulated studies (Wizard of Oz studies) and
- human-to-computer communication (e.g., rapid
prototyping).
7Design Steps
- Data collection
- existing applications and early prototypes
- human-to-human communication
- WoZ-studies
- Design
- interface specification
- prototyping
- Evaluation
- woz-studies
- empirical evaluation of prototypes
8Analýza prirodzených dialógov (1)
- Natural human-human conversations in the
application domain - E.g., the participants of a timetable guidance
service are recorded for later annotation and
analysis. - Can inform the design of human-computer
interaction, if properly applied - Usually in early point of the design process
9Analýza prirodzených dialógov (2)
- Used for
- Defining / refining the tasks that the
application must deal with, requirements and
functionality - To find out how people communicate with each
other vocabulary and grammar design - Help, prompt design, feedback and guidance
- Determining the overall tone of conversation
10Analýza prirodzených dialógov (3)
- Limitations
- Applicability for human-computer interaction?
- The results can be misleading and result in
unpractical or unusable systems. - The applicability of results from human-human
experiments should be verified before using them
as the basis for designing human-computer
interaction.
11Analýza dialógu metódou "Wizard of Oz" (1)
- The idea is that the human operator simulates
(some parts) of the computer . - Usually the user believes that he/she is
interacting with a computer - at least initial dialogue design should be fixed
- The interaction is recorded and analyzed
- Should reveal major usability problems with the
design - Should reveal the interaction patterns in
computer-human dialogue
12Analýza dialógu metódou "Wizard of Oz" (2)
- Used for
- Preliminary usability testing
- Finding interaction design flaws
- Refining vocabulary / grammar
- Finding differences in human-human vs.
human-computer interaction - Verify designed interaction techniques before
they are implemented
13Analýza dialógu metódou "Wizard of Oz" (3)
- Initial dialogue design must be fixed
- The system functionality must be consistent
- If not can lead to false results
- Can be made in different phases of the
development process - The whole system can be simulated
- Human operator, speech is computerised with the
use of signal processing (e.g. Vocoder) - A part of the system is replaced with human
operator - E.g. Speech recognition engine
14Analýza dialógu metódou "Wizard of Oz" (4)
- WoZstudies share the applicability problem with
human-human experiments especially, if the users
know the real nature of the system, they may
behave differently than with a real system. - In a bus travel information systems Woz
experiment results did not correspond to the
studies conducted later with a working system
Johnsenet al., 2000
15Analýza dialógu metódou "Wizard of Oz" (5)
- It is not trivial to simulate computer
applications in a coherent way and at the same
time to respond accurately and fast enough. - The simulation of errors and other
technology-related limitations may be difficult. - In some cases, it may not be possible to simulate
systems at all.
16Analýza dialógu metódou "Wizard of Oz" (6)
- Badly conducted tests can lead to misleading
results - Conducting WoZto a badly tested / poorly finished
applications usually reveals only the bugs of the
application - Conducting WoZexperiments is laborious and work
intensive - Needs at least one person all the time to control
the system - Usually needs special applications to control the
dialogue
17WoZ Tools
- Suede A Wizard of Oz Prototyping Tool for Speech
User Interfaces(video) - Rapid testing of interaction design
- Iterations of the design can be made quickly
before actual implementation of the system - Can be downloaded from http//guir.berkeley.edu/p
rojects/suede
18Human-Computer Communication
- If possible, existing applications (prototypes,
similar systems) can be used to collect data for
analysis and basis for the design. - Rapid prototyping might be better solution than
natural recordings and WoZstudies.
19Rapid prototyping
- Tools available
- CSLU Toolkit
- VoiceXML
- These tools have several restrictions
- when the development reaches the limits of the
toolkit, the development must be redone all over
with the real tools - Other languages besides English are badly
represented
20CSLU Toolkit
- The CSLU Toolkit A Platform for Research and
Development of Spoken-Language Systems - Center for SpokenLanguageUnderstanding/ Oregon
GraduateInstitute of Science and Technology - Development started in 1992 (!)
- Free for research use
- Available from http//www.cslu.ogi.edu/toolkit
21CSLU Toolkit
- Toolkit structure
- core technologies
- speech recognition (CSLU)
- speech synthesis (Festival University of
Edinburg) - facial animation
- toolkit levels
- c-level low level functions
- package level c-interface
- script-level tclrecognition, TTS, face
animation - GUI-level RAD
22Dialogue Management
- Two viewpoints
- Dialogue management strategies
- How the initiative is handled?
- The Strategy used may be system-initiative,
user-initiative or mixed-initiative. - Dialogue control model
- Refers to the ways in which the dialogue is
implemented from the point of view of the system.
23System Initiative (1)
- The computer asks questions from the user to
receive the necessary information to compute a
solution is computed and produce a response. - Can be highly efficient since the paths which the
dialogue flow can take are limited and
predictable. - The most challenging issue for dialogue
management is to handle errors successfully and
ask relevant questions from the user.
24System Initiative (2)
- The dialogue flow is predictable makes it
possible to use context-sensitive recognition
grammars (every dialogue state can have a
tailored recognition grammar) - In non-optimal situations, such as in telephone
applications or public information kiosks this
can make the application usable even if the
recognizer cannot use other than simple
recognition grammars.
25System Initiative (3)
- The system guides the user to help the user to
reach his/her goal. - Since the system asks questions, the user can be
sure that all necessary steps will be performed. - The user feel comfortable with the system and
prevents disorientation. - Particularly suitable for novice users who do not
know how the system works.
26System Initiative (4)
- Interaction might be clumsy with experienced
users. - Especially if the system assumes that only single
pieces of information are exchanged in every
dialogue turn - Can be reduced by letting the system accept
multiple pieces of information with a single
utterance gt experienced users may pass certain
dialogue turns by using more complicated
expressions. - Makes the dialogue management and the recognition
grammars more complex.
27System Initiative (5)
- Most suitable for well-defined, sequential tasks
where the system needs to know certain pieces of
information in order to perform a database query
or similar information retrieval tasks. - Open-ended tasks cannot be modeled using
sequential tasks without the interface becoming
inefficient and inflexible. - There are different tasks in many applications,
and although one dialogue strategy may not be
suitable for the overall dialogue flow, it may be
suitable in some parts of the dialogue.
28User Initiative (1)
- The system waits for user inputs and reacts to
these by performing corresponding operations. - Assumes that the user knows what to do and how to
interact with the system. - Often called command and control approach,
although the language used may be rather
sophisticated. - The user is the active participant in these
systems regarding the dialogue initiative.
29User Initiative (2)
- Experienced users are able to use the system
freely and perform operations any way they like
without the system getting in their way. - This is natural in open-ended tasks which have
many independent subtasks.
30User Initiative (3)
- Require that users are familiar with the system
and know how to speak. - The common argument favoring user-initiative
systems is that if the natural language
understanding capabilities of the system are
advanced, the system can understand freely spoken
natural language utterances.
31User Initiative (4)
- Freely spoken natural language utterances are
seldom realistic, since the use of unrestricted
language leads to very open language models,
which most commercial speech recognizers cannot
handle. - Even if the computer could understand freely
expressed sentences, the user would have to know
the task structure in order to give all the
necessary information to the computer. This loads
the cognitive capabilities of the user.
32Mixed-Initiative (1)
- Both system-initiative and user-initiative
dialogue strategies have their advantages and
disadvantages gt there is no single dialogue
management strategy which is suitable for all
situations. - Different users and application domains have
different needs, and the accuracy of the speech
recognizer affects as well the selection of
dialogue strategy. - Different dialogue strategies are needed for
different situations.
33Mixed-Initiative (2)
- Walker et al. 1998 found that mixed-initiative
dialogues are more efficient but not as preferred
as system-initiative dialogues in the e-mail
domain. - They argue that this is mainly because of the low
learning curve and predictability of
system-initiative interfaces. - System-initiative interfaces, on the other hand,
are more inefficient and could frustrate more
experienced users. - This supports the view that different dialogue
handling strategies are needed even inside single
applications
34Mixed-Initiative (3)
- Assumes that the initiative can be taken either
by the user or the system. - The user has freedom to take the initiative, but
when there are problems in the communication, or
the task requires it, the system takes the
initiative and guides the interaction. - Applications can use mixed-initiative strategy in
different ways. For example, tasks may form a
hierarchy in which different subtasks can use
different dialogue strategies.
35Mixed-Initiative (4)
- The system can adapt the style of the interaction
to suit particular users or situations based on
the success of the interaction. - This can be done, e.g., by using the
system-initiative strategy at the beginning and
letting the user take more initiative when she or
he learns how to interact with the system. - If the user has problems with the user-initiative
strategy, the system can take the lead if the
interaction is not proceeding as well as expected.
36Mixed-Initiative (5)
- A mixed-initiative system can help the user by
employing system-initiative strategy while still
preserving the freedom and efficiency of
user-initiative strategy. - In practice, the mixed-initiative strategy is
often a synonym for user-initiative strategy with
system-initiated error handling.
37Mixed-Initiative (6)
- If the dialogue is modeled using the
user-initiative strategy with addition of several
system-initiative sub-dialogues, the support for
system-initiative dialogues may be rather
limited. - If a predominantly system-initiative system
allows the user to take the lead, the system may
suffer from the problems of user-initiative
strategy without gaining any real advantage for
the interaction.
38Mixed-Initiative (7)
- If the dialogue is modeled using the
user-initiative strategy with addition of several
system-initiative sub-dialogues, the support for
system-initiative dialogues may be rather
limited. - If a predominantly system-initiative system
allows the user to take the lead, the system may
suffer from the problems of user-initiative
strategy without gaining any real advantage for
the interaction.
39Basic approaches to dialogue Control
- Finite-state machines
- Frame based dialogue systems
- AI / Agent based dialogue systems
40Finite-state Machines (1)
- Consists of a set of nodes representing dialogue
states and a set of arcs between the nodes. - Arcs represent transitions between states. The
resulting network represents the whole dialogue
structure. - Paths through the network represent all the
possible dialogues which the system is able to
produce. - Typically, nodes represent computer responses and
arcs represent user inputs, which move the
dialogue from one state to another.
41Finite-state Machines (2)
- Represents dialogues explicitly and in an easily
computable way. - States can also be used to model the task
structures and context knowledge. For example,
there can be a specific recognition grammar
associated with every state.
42Finite-state Machines (3)
- Extensions to the basic model include
sub-dialogues, or in a more general form
different hierarchically organized finite-state
machines. - In order to reduce connections between states,
sub-dialogues can be global states, which means
that there are default transitions from all other
states to these states.
43Finite-state Machines (4)
- Most suitable for well-structured and compact
tasks and small-scale applications. - If there are numerous states and a lot of
transitions between states, the complexity of the
dialogue model increases rapidly. - Common operations which can take place in most
situations, such as error correction procedures,
increase this complexity enormously.
44Finite-state Machines (5)
- Not the best possible solution when the task
structure is complex or it does not correspond to
the dialogue structure. - When the number of different possibilities, i.e.,
the number of connections between states
increases, the dialogue model becomes
unmanageable even if divided into subtasks.
45Frames (1)
- Templates(i.e., collections of information) are
used as a basis for dialogue management. - The purpose of the dialogue is to fill necessary
information slots, i.e., to find values for the
required variables and then perform a query or
similar operation on the basis of the frame.
46Frames (2)
- The heart of form-based dialogues is the
implementation of the dialogue control algorithm,
i.e., the algorithm which chooses how to reach
the user inputs. - Variations of the template approach include
schemas, e-forms, task-structure graphs and type
hierarchies McTear, 2002.
47Frames (3)
- Frame-based systems are more open than state
machines, since there is no predefined dialogue
flow. The dialogue can take any form to fill the
necessary slots (in theory). - Multiple slots can be filled by using a single
utterance, and the order of filling the slots is
free.
48Frames (4)
- There are practical limitations, as well as
dependencies between slots which make these
systems a little more complicated and the
possible dialogue paths more restricted than in
theory. - The frame-based dialogue control model is a more
natural choice for implementing mixed-initiative
dialogue strategy than the finite-state model,
since the computer may take the initiative by
simply asking for the required fields.
49Agent-based Dialogue Control
- Both of the dialogue partners are seen as
intelligent in that sense that they have
knowledge and expectations about the task at hand - The initiative tends to be mixed
- The goal is to go into cooperative dialogue with
the user - The system may provide answer that does not
exactly match the users need, but instead what
the system thinks that it might be in the
interest of the user - The user may introduce new subjects into
conversation - Basically the system and the user have the same
problem/task which is tried to solve.
50Other Control Approaches
- Event-based systems
- Collaborative agents
- Theorem-proving systems
- Dialogue description languages
51Summary
- State-based approach
- Useful in small scale applications, where the
structure of the dialogue can be modelled to
separate states with ease - Especially system-initiative dialogues
- Frame-based approach
- Useful when it is needed to let the user to give
the inputs in more free form (number of items,
the order of items) - Especially user-initiative dialogues
52Lecture 5
- Prompt Design
- Prompt Design Guidelines
- Prompting Techniques
- Advanced Techniques
- Tutoring Agents
- Universal Speech Interfaces
53Content
- Prompt Design
- Prompt Design Guidelines
- Prompting Techniques
- Advanced Techniques
- Tutoring Agents
- Universal Speech Interfaces
54Prompt design
- Prompting is a key issue for successful
interaction - People adapt to the way that the computer speaks
and use both the same style and words which occur
in the computer's turns. - Prompts can guide the interaction in the desired
direction and help ASR, NLU and dialogue
management components to understand the user
utterances better. - Even simple prompts may cause misunderstanding if
they are poorly constructed. - Even in yes/no questions Hockey et al., 1997.
- Prompting techniques allow the system to adapt to
both experienced and novice users.
55Foundations
- Memory restrictions
- 7 -2 rule
- Length of the prompts (speech is temporal media)
- Communication style
- Barge-in supported?
- Relation to error management
- Implicit and explicit confirmations
- Error correction