roadmap to a thesis proposal

About This Presentation

Title:

roadmap to a thesis proposal

Description:

S: What is your full name ? U: Adam Schumacher [ATHENS IN AKRON] ... Which destination did you want ? U: Neither [NEITHER] S: I'm sorry, I didn't catch that. ... – PowerPoint PPT presentation

Number of Views:102

Avg rating:3.0/5.0

Slides: 65

Provided by: csC76

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: roadmap to a thesis proposal

1
Increased Robustness in Spoken Dialog Systems

(roadmap to a thesis proposal)
Dan Bohus,
SPHINX Lunch, May 2003

2
The problem

S Are you a registered user ?
U No I'm not. No NO I'M NOT NO
S What is your full name ?
U Adam Schumacher ATHENS IN AKRON
S There is an Athens in Georgia and in Greece.
Which destination did you want ?
U Neither NEITHER
S I'm sorry, I didn't catch that. There is an
Athens in Georgia and in Greece. Which
destination did you want ?
U Georgia GEORGIA
S A flight from Athens... Where do you want to
go ?
U Start over START OVER
S Do you really want to start over ?
U Yes YES
S What city are you leaving from ?
U Hamilton, Ontario HILTON ONTARIO
S Sorry, I'm not sure I understood what you
said. Where are you leaving from ?
U Hamilton HILTON
S I'm still having trouble understanding you. To
go on, I need you to answer the following
question. What city are you leaving from ?
U Toronto TORONTO

3
Is this really a big problem?

Lots of anecdotal evidence

And some statistics
CMU Communicator
66 of the sessions contain a serious
misunderstanding
26 result in a complete breakdown in interaction
Remaining 40 are still frustrating experiences

4
More statistics

USC study Shin et al (1)
Labeled errors and user behavior on Communicator
(multi-site) corpus
Average 1.66 error segments/session
78 error segments get back on track
37 of the sessions have errors leading to
complete breakdown in interaction

Failed
37
5
Yet more statistics

Utterance level understanding error rates
CMU Communicator 32.4 ? 66 of sess.
Rudnicky, Bohus et al (2)
CU Communicator 27.5 ? of sess.
Segundo (3)
HMIHY (ATT) 36.5 ? of sess.
Walker (4)
Jupiter (MIT) 28.5 ? of sess.
Hazen (5)

6
It is a significant problem !

Roughly

10-30 lead to interaction breakdowns
60-70 contain misunderstandings
7
Goal of proposed work

interaction breakdowns
sessions containing misunderstandings
8
Outline

The problem
Sources of the problem
The approach
Infrastructure the RavenClaw framework
Proposed work, in detail
Discussion

9
The problems in more detail

S Are you a registered user ?
U No I'm not. No NO I'M NOT NO
S What is your full name ?
U Adam Schumacher ATHENS IN AKRON
S There is an Athens in Georgia and in Greece.
Which destination did you want ?
U Neither NEITHER
S I'm sorry, I didn't catch that. There is an
Athens in Georgia and in Greece. Which
destination did you want ?
U Georgia GEORGIA
S A flight from Athens... Where do you want to
go ?
U Start over START OVER
S Do you really want to start over ?
U Yes YES
S What city are you leaving from ?
U Hamilton, Ontario HILTON ONTARIO
S Sorry, I'm not sure I understood what you
said. Where are you leaving from ?
U Hamilton HILTON
S I'm still having trouble understanding you. To
go on, I need you to answer the following
question. What city are you leaving from ?
U Toronto TORONTO

10
Three contributing factors

1. Low accuracy of speech recognition
2. Inability to assess reliability of beliefs
3. Lack of efficient error recovery and
prevention mechanisms

11
Factor 1 Low recognition accuracy

ASR still imperfect at best
Variability environmental, speaker
10-30 WER in spoken language systems
Tradeoff Accuracy vs. System Flexibility
Effect Main source of errors in SDS
WER ? most important predictor of user
satisfaction Walker et al (6,7)
Users prefer less flexible, more accurate systems
Walker et al (8)

12
Factor 2 Inability to assess reliability of
beliefs

Errors typically propagate to the upper levels of
the system, leading to
Non-understandings
Misunderstandings
Effect Misunderstandings are taken as facts and
acted upon
At best extra turns, user-initiated repairs,
frustration
At worst complete breakdown in interaction

13
Factor 3 Lack of recovery mechanisms

Small number of strategies
Implicit and explicit verifications most popular
Sub-optimal implementations
Triggered in an ad-hoc / heuristic manner
Problem is often regarded as an add-on
Non-uniform, domain-specific treatment
Effect Systems prone to complete breakdowns in
interaction

14
Outline

The problem
Sources of the problem
The approach
Infrastructure the RavenClaw framework
Proposed work, in detail
Discussion

15
Three contributing factors

1. Low accuracy of speech recognition
2. Inability to assess reliability of beliefs
3. Lack of efficient error recovery and
prevention mechanisms

16
Approach 1

1. Low accuracy of speech recognition
2. Inability to assess reliability of beliefs
3. Lack of efficient error recovery and
prevention mechanisms

17
Approach 2

1. Low accuracy of speech recognition
2. Inability to assess reliability of beliefs
3. Lack of efficient error recovery and
prevention mechanisms

18
Why not just fix ASR?

ASR performance is improving, but requirements
are increasing too
ASR will not become perfect anytime soon
ASR is not the only source of errors
Approach 2 ensure robustness under a large
variety of conditions

19
Proposed solution

Assuming the inputs are unreliable

A.Make systems able to assess the reliability of
their beliefs
B.Optimally deploy a set of error prevention and
recovery strategies
20
Proposed solution more precisely

Assuming the inputs are unreliable

1.Compute grounding state indicators -
reliability of beliefs (confidence annotation
updating) - correction detection -
goodness-of-dialog metrics - other user
models, etc
B.Optimally deploy a set of error prevention and
recovery strategies
21
Proposed solution more precisely

Assuming the inputs are unreliable

1.Compute grounding state indicators -
reliability of beliefs (confidence annotation
updating) - correction detection -
goodness-of-dialog metrics - other user
models, etc
2.Define the grounding actions - error
prevention and recovery strategies 3.Create a
grounding decision model - decides upon the
optimal strategy to employ at a given point

Do it in a domain-independent manner !

22
Outline

The problem
Sources of the problem
The approach
Infrastructure the RavenClaw framework
Proposed work, in detail
Discussion

23
The RavenClaw DM framework

Dialog Management framework for complex,
task-oriented dialog systems
Separation between Dialog Task and Generic
Conversational Skills
Developer focuses only on Dialog Task description
Dialog Engine automatically ensures a minimum set
of conversational skills
Dialog Engine automatically ensures the grounding
behaviors

24
RavenClaw architecture
Communicator
Welcome
Login
Travel
Locals
Bye
AskRegistered
GreetUser
GetProfile
Leg1
AskName
DepartLocation
ArriveLocation

Dialog Task implemented by a hierarchy of agents
Information captured in concepts
Probability distributions over sets of values
Support for belief assessment grounding
mechanisms

25
Domain-Independent Grounding
26
RavenClaw-based systems

LARRISymphony Language-based Assistant for
Retrieval of Repair Information
IPANASA Ames Intelligent Procedure Assistant
BusLineLets Go! Pittsburgh bus route
information
RoomLine conference room reservation at CMU
TeamTalk11-754 spoken command and control for
a team of robots

27
Outline

The problem
Sources of the problem
The approach
Infrastructure the RavenClaw framework
Proposed work, in detail
Discussion

28
Previous/Proposed Work Overview
1.Compute grounding state indicators -
reliability of beliefs (confidence annotation
updating) - correction detection -
goodness-of-dialog metrics - other user
models, etc
2.Define the grounding actions - error
prevention and recovery strategies 3.Create a
grounding decision model - decides upon the
optimal strategy to employ at a given point
29
Proposed Work, in Detail - Outline
1.Compute grounding state indicators -
reliability of beliefs (confidence annotation
updating) - correction detection -
goodness-of-dialog metrics - other user
models, etc
???
2.Define the grounding actions - error
prevention and recovery strategies 3.Create a
grounding decision model - decides upon the
optimal strategy to employ at a given point
30
Reliability of beliefs

Continuously assess reliability of beliefs
Two sub-problems
Computing the initial confidence in a concept
Confidence annotation problem
Update confidence based on events in the dialog
User reaction to implicit or explicit
verifications
Domain reasoning

31
Confidence annotation

Traditionally focused on ASR Chase(9),
More recently, interest in CA geared towards use
in SDS Walker(4), Segundo(3), Hazen(5),
Rudnicky, Bohus et al (2)
Utterance-level, Concept-level CA
Integrating multiple features
ASR acoustic lm scores, lattice, n-best
Parser various measures of parse goodness
Dialog Management state, expectations, history,
etc
50 relative improvement in classification error

32
Confidence annotation To Do List

Improve accuracy even more ???
More features / Less features / Better features
Study transferability across domains ???
Q Can we identify a set of features that
transfer well?
Q Can we use un- or semi-supervised learning or
bootstrap from little data and an annotator in a
different domain?

33
Confidence updating ???

To my knowledge, not really studied yet!

34
Confidence updating approaches

Naïve Bayesian updating ???
Assumptions do not match reality
Analytical model ???
Set of heuristic / probabilistic rules
Data-driven model ???
Define events as features
Learning task
Initial Conf. E1 E2 E3 ? Current Conf.
1/0
Bypass confidence updating ???
Keep all events as grounding state indicators
(doesnt lose that much information)

35
Proposed Work, in Detail - Outline
1.Compute grounding state indicators -
reliability of beliefs (confidence annotation
updating) - correction detection -
goodness-of-dialog metrics - other user
models, etc
???
2.Define the grounding actions - error
prevention and recovery strategies 3.Create a
grounding decision model - decides upon the
optimal strategy to employ at a given point
36
Proposed Work, in Detail - Outline
1.Compute grounding state indicators -
reliability of beliefs (confidence annotation
updating) - correction detection -
goodness-of-dialog metrics - other user
models, etc
???
???
2.Define the grounding actions - error
prevention and recovery strategies 3.Create a
grounding decision model - decides upon the
optimal strategy to employ at a given point
37
Correction Detection

Automatically detect at run-time correction sites
or aware sites
Another data-driven classification task
Prosodic features, bag-of-words features, lexical
markers Litman(10), Bosch(11), Swerts(12),
Lewov(13)
Useful for
implementation of implicit / explicit
verifications
belief assessment / updating
as direct indicator for grounding decisions

38
Correction Detection To Do List

Build an aware site detector ???
Q Can we identify what is the user correcting?
Study transferability across domains
???
Q Can we identify a set of features that
transfer well?
Q Can we use un- or semi-supervised learning or
bootstrap from little data and a detector in a
different domain?

39
Proposed Work, in Detail - Outline
1.Compute grounding state indicators -
reliability of beliefs (confidence annotation
updating) - correction detection -
goodness-of-dialog metrics - other user
models, etc
???
???
2.Define the grounding actions - error
prevention and recovery strategies 3.Create a
grounding decision model - decides upon the
optimal strategy to employ at a given point
40
Proposed Work, in Detail - Outline
1.Compute grounding state indicators -
reliability of beliefs (confidence annotation
updating) - correction detection -
goodness-of-dialog metrics - other user
models, etc
???
???
???
2.Define the grounding actions - error
prevention and recovery strategies 3.Create a
grounding decision model - decides upon the
optimal strategy to employ at a given point
41
Goodness-of-dialog indicators ???

Assessing how well a conversation is advancing
Non-understandings
Q Can we identify the cause?
Q Can we relate a non-understoodutterance to a
dialog expectation?
Dialog State related indicators / Stay_Here
Q Can we expand this to some distance to
optimal dialog trace?
Overall confidence in beliefs within topic
Q How to aggregate? Entropy-based measures?
Allow for task-specific metrics of goodness

42
Proposed Work, in Detail - Outline
1.Compute grounding state indicators -
reliability of beliefs (confidence annotation
updating) - correction detection -
goodness-of-dialog metrics - other user
models, etc
???
???
???
2.Define the grounding actions - error
prevention and recovery strategies 3.Create a
grounding decision model - decides upon the
optimal strategy to employ at a given point
43
Proposed Work, in Detail - Outline
1.Compute grounding state indicators -
reliability of beliefs (confidence annotation
updating) - correction detection -
goodness-of-dialog metrics - other user
models, etc
???
???
???
???
2.Define the grounding actions - error
prevention and recovery strategies 3.Create a
grounding decision model - decides upon the
optimal strategy to employ at a given point
???
44
Grounding Actions

Design and evaluate a rich set of strategies for
preventing and recovering from errors (both
misunderstandings and non-understandings)
Current status few strategies used / analyzed
Explicit verification Did you say Pittsburgh?
Implicit verification traveling from
Pittsburgh when do you want to leave?

45
Explicit Implicit Verifications

Analysis of user behavior following these 2
strategies Krahmer(10), Swerts(11)
User behavior is rich, correction detectors are
important!
Design is important!
Did you say Pittsburgh?
Did you say Pittsburgh? Please respond yes or
no.
Do you want to fly from Pittsburgh?
Correct implementation adequate support is
important!
Users discovering errors through implicit
confirmations are less likely to get back on
track hmm

46
Strategies for misunderstandings

Explicit verification (w/ variants)
Implicit verification (w/ variants)
Disambiguation
Im sorry, are you flying out of Pittsburgh or
San Francisco?
Rejection
Im not sure I understood what you said. Can you
tell me again where are you flying from?

47
Strategies for non-understandings - I

Lexically entrain
Right now I need you to tell me the departure
city You can say for instance, Id like to fly
from Pittsburgh.
Ask repeat
Im not sure I understood you. Can you repeat
that please?
Ask reformulate
Can you please rephrase that?
Diagnose
If non-understanding source can be
known/estimated, give that information to the
user
I cant hear you very well. Can you please speak
closer to the microphone?

48
Strategies for non-understandings - II

Select alternative plan Domain specific
strategies
E.g. try to get state name first, then city name
Establish context ( Confirm context variant)
Right now Im trying to gather enough
information to make a room reservation. So far I
know you want a room on Tuesday. Now I need to
know for what time you need the room.
Give targeted help
Give help on the topic / focus of the
conversation / estimated user goal
Constrain language model / recognition

49
Strategies for non-understandings - III

Switch input modality (i.e. DTMF, pen, etc)
Restart topic / backup dialog
Start-over
Switch to operator
Terminate session

50
Grounding Strategies To Do List

Design, implement, analyze, iterate
Human-Human dialog analysis ???
Design the strategies, with variants
andappropriate support ???
Implement in the RavenClaw framework ???
Perform data-driven analysis ???
Q User behaviors
Q Applicability conditions
Q Costs, Success rates

51
Proposed Work, in Detail - Outline
1.Compute grounding state indicators -
reliability of beliefs (confidence annotation
updating) - correction detection -
goodness-of-dialog metrics - other user
models, etc
???
???
???
???
2.Define the grounding actions - error
prevention and recovery strategies 3.Create a
grounding decision model - decides upon the
optimal strategy to employ at a given point
???
???
52
Grounding decision model

Decide which is the best grounding action to take
at a certain time
Goals / Desired properties
Domain Independent
Adaptive
Learn and target any dialog performance metric
Adjust to large variations in the reliability of
inputs
Accept any new strategies on the fly
Scalable

53
Previous work

Conversation as action under uncertainty
Horvitz(14), Paek(15)
Bayesian decision theory with assumed utilities
Reinforcement learning in spoken dialog systems
Kearns(16), Singh(17), Pieraccini(18),
Litman(19), Walker(20)
Learning dialog policies
Heuristic approaches add refs
Predominant in todays systems

54
Grounding Decision Theoretic Approach

Given
Set of states Ss and a probabilistic model of
state given some evidence e, P(se) ? grounding
state indicators
Set of actions Aa ? grounding actions
Model describing the utility of each action from
each state U(s,a) ? grounding model
Take action that maximizes expected utility
EU(ae) ?S U(a,s)p(se)

55
The missing ingredient Utilities

Utilities matrix (S x A)

Handcraft ???

Learn from data ???

56
Learning utilities

Essentially a POMDP problem
Hidden state
Belief dictated by grounding state indicator
models
Actions
Strategies
Rewards
Targeted optimization measures

EV
IC
IV
NGA
EV
C
IV
NGA
U
NGA
57
A possible overall architecture

2 types of grounding models
Dealing with misunderstandings, one grounding
model per concept
Dealing with non-understandings, one grounding
model per agent

Communicator
Welcome
Login
Travel
Locals
Bye
AskRegistered
GreetUser
GetProfile
Leg1
AskName
DepartLocation
ArriveLocation
58
A possible overall architecture

Q How to combine the decisions? ???
Identify a small set of rules
E.g. concepts first, then agents focused-to-top
Hierarchical POMDP approaches ? Roy, Pineau,
Thrun

Communicator
Welcome
Login
Travel
Locals
Bye
AskRegistered
GreetUser
GetProfile
Leg1
AskName
DepartLocation
ArriveLocation
59
A possible overall architecture

Q Formulate parallel learning problem ???
Large numbers of small models are good in
principle
Need to clearly identify assumptions
Or hierarchical learning problem

Communicator
Welcome
Login
Travel
Locals
Bye
AskRegistered
GreetUser
GetProfile
Leg1
AskName
DepartLocation
ArriveLocation
60
Proposed Work, in Detail - Outline
1.Compute grounding state indicators -
reliability of beliefs (confidence annotation
updating) - correction detection -
goodness-of-dialog metrics - other user
models, etc
???
???
???
???
2.Define the grounding actions - error
prevention and recovery strategies 3.Create a
grounding decision model - decides upon the
optimal strategy to employ at a given point
???
???
61
Evaluation

interaction breakdowns
sessions containing misunderstandings
62
Evaluation