Title: Improving User Interaction with Spoken Dialog Systems via Shaping
1 Improving User
Interaction with Spoken Dialog Systems via Shaping
15 December 2006 PhD Thesis Defense Language
Technologies Institute School of Computer
Science Carnegie Mellon University
2Thesis committee
- Roni Rosenfeld, chair
- Alex Rudnicky
- Alex Waibel
- Candy Sidner, MERL
3Question
- How can we help users have more efficient
interactions with spoken dialog systems? - Shaping teach users to say what the system
understands - When users say non-Speech Graffiti things, help
them learn the Speech Graffiti version
when is that showing at the Manor?
4Thesis statement
- Shaping can be used to induce more efficient user
interactions with spoken dialog systems. - The shaping strategy can improve efficiency by
increasing the amount of user input that is
actually understood by the system, leading to
increased task completion rates and higher user
satisfaction. - This strategy can also reduce upfront training
time, thus accelerating the process of realizing
more efficient interaction.
5Roadmap
- Speech Graffiti basics
- Related work, in brief
- Proposed strategy
- Evaluation three user studies
- Summary, conclusions discussion
6Speech Graffiti (i)
- A protocol for structured interaction with simple
machines - Why "Speech Graffiti?"
- User adapts input style
- Result is easier for system to process
speech graffiti basics related work proposed
strategy evaluation summary
7Speech Graffiti (ii)
- Addresses several issues with spoken dialog
systems - Clarifies system boundaries
- Simplifies development
- Allows flexible, direct access to data
- Provides universal interaction protocol
speech graffiti basics related work proposed
strategy evaluation summary
8User input
Speech Graffiti-based language model
Speech Graffiti-based speech recognition
hypothesis
is input Speech Graffiti?
Theater is Showcase North theater
Dramas
yes
no
give terse confirmation or query result
give error beep
Showcase Cinemas Pittsburgh North
error beep
speech graffiti basics related work proposed
strategy evaluation summary
9Baseline Speech Graffiti interaction
USER Theater is Showcase North theater SYSTEM
Showcase Cinemas Pittsburgh North.
Dramas beep Genre is drama Drama. What
movies are playing? beep Where was I?
Theater is Showcase Cinemas Pittsburgh North,
genre is drama. What is the title? 2 titles
Flags of Our Fathers, The Departed. Start
over Starting over. Area is North Hills South
Side Scratch that Scratched Area is North Hills,
title is The Illusionist North Hills, The
Illusionist
speech graffiti basics related work proposed
strategy evaluation summary
10Earlier Speech Graffiti evaluation
- Compared Speech Graffiti with an NL system in the
same domain (movies) (N23) - On average, Speech Graffiti users had
- Higher user satisfaction
- Lower error rates
- Lower task completion times
- Similar task completion rates
- Some users just didn't get it (N6)
speech graffiti basics related work proposed
strategy evaluation summary
11Related work
- Restricted / subset / structured languages are a
reasonable approach to HCI - Kelly (77), Black Moran (82), Jackson (83),
Sidner Forlines (02), etc. - Humans adapt conversation at many levels
- Pickering Garrod (04), etc.
- Zoltan-Ford (91), Brennan (96), Bell (03),
etc. - Convergence the process of interaction
adaptation whereby one partner adopts behavior
that is increasingly similar to that of the other
partner - (Burgoon, Stern, Dillman, 95)
speech graffiti basics related work proposed
strategy evaluation summary
12User input
is input Speech Graffiti?
dramas
no
theater is Showcase North theater
yes
is input shapeable?
give terse confirmation or query result
no
yes
what
give error beep
give shaping confirmation
Showcase Cinemas Pittsburgh North
something to encourage user to say genre is
drama
speech graffiti basics related work proposed
strategy evaluation summary
13Expanded grammar
- Include a grammar that accepts more natural
language input compared to Speech Graffiti - This is still not a full natural language grammar
- Exploit the idea that knowledge of speaking to a
restricted-language system limits input - 2-pass ASR process
speech graffiti basics related work proposed
strategy evaluation summary
14Evaluations
- Series of three user studies
- Some iterative design changes
- MovieLine
- Current info about Pittsburgh theaters and movies
speech graffiti basics related work proposed
strategy evaluation summary
15User Study I baseline vs. simple shaping
- Is a simple, adaptation-theoretic shaping
strategy effective in increasing efficiency? - How well does 2-pass strategy work?
- Generate corpus to inform shaping strategy
refinements
speech graffiti basics related work proposed
strategy evaluation summary
16Simple shaping confirmation
- Adaptation lexical entrainment is common in H-H
and H-C interaction - Maybe this will work here?
- Confirm expanded language input with Speech
Graffiti slotvalue form
USER Manor theater movie types
USER Manor theater movie types SYSTEM theater
is Cinemagic Manor theater, requesting genre. 5
matches biography, comedy, crime,
speech graffiti basics related work proposed
strategy evaluation summary
17Study I participants conditions
- 29 native Amer. Eng. speakers, ages 23-54, little
to no computer programming exp. - Between-subjects, single domain (MovieLine)
- ORIGINALtutorial (baseline Speech Graffiti, no
shaping) - SIMPLEtutorial
- SIMPLEno_tutorial
speech graffiti basics related work proposed
strategy evaluation summary
18 sample ORIGINAL interaction Theater Manor,
genre is comedy Cinemagic Manor Theatre,
comedy What are movies? 3 matches Friends with
Money, Thank You for Smoking, Tsotsi Galleria
beep Theater is Galleria Carmike
Galleria 6 Genre is drama, whats playing?
confsig, drama Where was I? Theater is Carmike
Galleria 6, genre is drama What beep
sample SIMPLE interaction Theater Manor, genre
is comedy Theater is Cinemagic Manor Theatre,
genre is comedy What are movies? Requesting
movie. 3 matches Friends with Money, Thank You
for Smoking, Tsotsi Galleria Theater is
Carmike Galleria 6 Theater is Galleria
Theater is Carmike Galleria 6 Genre is drama,
whats playing? Genre is drama, requesting
movie. Sorry, there are no
matches. Where was I? Theater is Carmike Galleria
6, genre is drama, what is movie? What
beep
speech graffiti basics related work proposed
strategy evaluation summary
19Study I setup
- 15 MovieLine tasks
- You want to see Fantastic Four at the Norwin
Hills theater. Find out when its showing there. - Completed in lab, via telephone (20-40mins.)
- SASSI-based user satisfaction survey (Hone
Graham, 00) - Seven user satisfaction factors
- System response accuracy, Likeability, Cognitive
demand, Annoyance, Habitability, Speed, TTS -
speech graffiti basics related work proposed
strategy evaluation summary
20Study 1 results efficiency user satisfaction
- No significant differences between ORIGINAL
SIMPLE - Trend towards greater efficiency for SIMPLE
- Higher task completion
- Lower median time turns on task
- Trend towards greater satisfaction for SIMPLE
Original Simple
21Study 1 results tutorial
- No significant differences between
SIMPLEtutorial and SIMPLEno-tutorial - Pre-use tutorial is not necessary
speech graffiti basics related work proposed
strategy evaluation summary
22Study 1 results grammaticality
- No significant differences
- Intrasession grammaticality is key
- Evidence of convergence
- Significant within-subj. grammaticality increases
for both groups - ORIGINAL group increased significantly more
sharply - Stronger correlation between grammaticality and
user satisfaction task success for ORIGINAL
Original Simple
23User study II more-explicit shaping
- Can more-explicit shaping strategies have an
effect on efficiency and Speech Graffiti
convergence?
speech graffiti basics related work proposed
strategy evaluation summary
24Study II participants conditions
- 30 native Amer. Eng. speakers, ages 21-54, little
to no computer programming exp. - Between-subjects, single domain (MovieLine)
- SIMPLE
- SUGGESTING
- REQUIRING
speech graffiti basics related work proposed
strategy evaluation summary
25SUGGESTING prompt
- Give a Speech Graffiti example
- Encourage user to speak that way next time
- Not encourage immediate repeating
- Not encourage a yes/no response
- Leave open the possibility that there was a
recognition error
USER Manor theater movie types SYSTEM I think I
heard Manor the movie types. Next time it would
help to use Speech Graffiti, as in theater is
Cinemagic Manor theater, list genres. Listing
5 genres biography, comedy, crime,
USER Manor theater movie types
26REQUIRING prompt
- Give a Speech Graffiti example
- Ask the user to rephrase immediately
- Leave open the possibility that there was a
recognition error
USER Manor theater movie types
USER Manor theater movie types SYSTEM Please
rephrase that using Speech Graffiti. For
example, theater is Cinemagic Manor theater,
list genres.
27Study 1I results efficiency user satisfaction
- No sig. differences between 3 conditions
- Satisfaction scores for REQUIRING tended to be
lowest - Habitability lowest for SIMPLE
Simple Suggesting Requiring
speech graffiti basics related work proposed
strategy evaluation summary
28Study 1I results global grammaticality
- No significant differences
- Evidence of convergence
- Significant within-subject
- grammaticality increases
- for all groups
Simple Suggesting
Requiring
speech graffiti basics related work proposed
strategy evaluation summary
29Study II initial grammaticality
- Consider how quickly users become proficient in
Speech Graffiti - Target grammaticality level for success 80
- Low initial grammaticality
- lt 80 in 1st quarter
- High initial grammaticality
- 80 in 1st quarter
speech graffiti basics related work proposed
strategy evaluation summary
30Study II initial grammaticality (ii)
- REQUIRING condition
- Good for low-initial-
grammaticality users - Helps them know what to say
- Not so good for high-initial-grammaticality users
- Not error-robust, so confusing and/or annoying
- ? suggested a more flexible approach
Simple Suggesting
Requiring
USER theater is Cinemagic Manor theater SPEECH
RECOGNITION HYPOTHESIS cinemagic manor
theater SYSTEM Please rephrase that using
Speech Graffiti. For example, theater is
Cinemagic Manor theater.
31User study III adaptive shaping
- How does an adaptive shaping strategy affect
interaction efficiency? - How does interaction efficiency change over the
course of a users experience with a system? - How well do users transfer skills from one Speech
Graffiti application to another?
speech graffiti basics related work proposed
strategy evaluation summary
32Study III participants conditions
- 27-22 native Amer. Eng. speakers, ages 23-54,
little to no computer programming exp. - Two conditions
- SUGGESTING (now including ASR hyp. only on every
3rd trigger) - ADAPTIVE
speech graffiti basics related work proposed
strategy evaluation summary
33Study III ADAPTIVE shaping
- Users start with REQUIRING version
- After proficiency established, shift to SIMPLE
- Shift back to REQUIRING if user is excessively
ungrammatical
USER theater is Cinemagic Manor theater SPEECH
RECOGNITION HYPOTHESIS cinemagic manor
theater SYSTEM theater is Cinemagic Manor
theater.
speech graffiti basics related work proposed
strategy evaluation summary
34Study III setup longitudinal
- 1st session in lab, 8 tasks
- Sessions 2-6 independent, 1 week apart
- Four tasks in sessions 2, 3, 4 6 six in
session 5 - SASSI-based user satisfaction survey
- After sessions 1,4 6
speech graffiti basics related work proposed
strategy evaluation summary
35Study III setup cross-domain
- Sessions 1 through 4 MovieLine
- Sessions 5 6 DineLine
- Pittsburgh restaurant info
- Same of slots (9) as MovieLine
- Non-shaping
- Can shaping apps be training apps?
Area is South Side. South Side.Cuisine is
American. American.List restaurants. Listing 7
restaurants City Grill, Hot Metal Grille,
Marios Southside Saloon, and more.
speech graffiti basics related work proposed
strategy evaluation summary
36Study III results efficiency
- Session 1, between subjects
- Similar task completion
- Generally lower median time/turns for ADAPTIVE
- Significantly higher time/turns-to-completion for
ADAPTIVE
Adaptive Suggesting
Adaptive
Suggesting
Adaptive Suggesting
speech graffiti basics related work proposed
strategy evaluation summary
37Study III results longitudinal cross-domain
- Task completion
- Increased from S1?S4
- Decreased to S6, significantly so for SUGGESTING
- SUGGESTING users completed sig. fewer DineLine
tasks - Time- and turns-to-completion
- Decreased S1?S4, increased to S6
- No real difference between initial MovieLine
initial DineLine - Median time- turns-on-task sig. lower for
ADAPTIVE DL
38Study III results user satisfaction
Adaptive Suggesting
Adaptive Suggesting
Adaptive Suggesting
speech graffiti basics related work proposed
strategy evaluation summary
39Study III results grammaticality
- Significant intrasession increases for whole
population - Significant increases S1?S4, S5, S6
- Somewhat stronger S1?S5 for ADAPTIVE (p 0.13)
- 80 threshold
- S1 6 users (22)
- S5 16 users (70)
speech graffiti basics related work proposed
strategy evaluation summary
40Evaluation summary
- User Study I
- Trend towards increased efficiency satisfaction
for shaping - Successful interactions without tutorial
- Two-pass ASR successful
- Overall intrasession convergence
- User Study II
- Overall intrasession convergence
- REQUIRING
- Strong local convergence
- Trend towards lower satisfaction, non-robust to
errors - SIMPLE lower habitability
- User Study III
- Overall intra- intersession convergence
- Cross-domain transfer over all participants
- to a standard Speech Graffiti application with
no tutorial - ADAPTIVE
- More time/turns for completed tasks in initial
session - Increased task completion across domains
likeability over time, trend towards greater
grammaticality change
speech graffiti basics related work proposed
strategy evaluation summary
41Thesis statement, revisited
- Shaping can be used to induce more efficient user
interactions with spoken dialog systems. - The shaping strategy can improve efficiency by
increasing the amount of user input that is
actually understood by the system, leading to
increased task completion rates and higher user
satisfaction. - Significantly reduced concept error (Study I) ?
higher task completion, lower median time/turns
(p lt .25), higher mean satisfaction scores - This strategy can also reduce upfront training
time, thus accelerating the process of realizing
more efficient interaction. - Pre-use tutorial not necessary (Study 1)
- Also supports cross-domain skill transfer
(significant increase in initial grammaticality
with new system)
speech graffiti basics related work proposed
strategy evaluation summary
42Contributions (i)
- Investigation of specific shaping strategies
- SIMPLE strategy low habitability, especially for
low-initial-grammaticality users - REQUIRING strategy by far the strongest effect
on local convergence, but non-robust to ASR
errors - ADAPTIVE strategy initially lower efficiency,
but supports better cross-domain performance - Users generally exhibited intrasession
grammaticality increases regardless of the
particular shaping strategy, attesting to the
power of convergence as a general phenomenon.
speech graffiti basics related work proposed
strategy evaluation summary
43Contributions (ii)
- More efficient interactions
- Shaping can eliminate need for tutorial, without
a corresponding decline in interaction
efficiency. - Integration of shaping and the two-pass
recognition process allows users to complete
tasks while using natural language and learning
the Speech Graffiti format. - As successive iterations of shaping strategies
have been implemented, mean user satisfaction
scores have risen, indicating more effective
interactions. - ? More efficient interactions for more users
speech graffiti basics related work proposed
strategy evaluation summary
44Contributions (iii)
- Demonstration of a spoken dialog system that
- Is fully functional (not Wizard-of-Oz)
- Is not directed-dialog
- Accesses real-world data
- Andleverages users propensity for convergence
speech graffiti basics related work proposed
strategy evaluation summary
45Future work / extensions
- Public system, á là Lets Go!
- Visual aids
- From reference cards to multimodal systems
- More complex domains (10s-100s of slots)
- Integration into natural language system
- Shape to acoustically preferred/unambiguous input?
speech graffiti basics related work proposed
strategy evaluation summary
46Time for discussion
47References
- Bell, L. (2003). Linguistic adaptations in spoken
human-computer dialogues Empirical studies of
user behavior. (PhD thesis, KTH, Stockholm). - Black, J.B. and Moran, T.P. (1982.) Learning and
Remembering Command Names. In Proceedings of the
Conference on Human Factors in Computing Systems,
pp. 8-11. - Brennan, S.E. (1996.) Lexical entrainment in
spontaneous dialog. In Proceedings of the
International Symposium on Spoken Dialogue, pp.
41-44. - Burgoon, J.K., Stern, L.A., Dillman, L. (1995).
Interpersonal adaptation Dyadic interaction
patterns. Cambridge Cambridge University Press. - Jackson, M.D. (1983.) Constrained Languages Need
Not Constrain Person/Computer Interaction.
SIGCHI Bulletin, 15(2-3)18-22. - Kelly, M. (1977). Limited vocabulary natural
language dialogue. International Journal of
Man-Machine Studies, 9, 479-501. - Pickering, M.J. Garrod, S. (2004). Toward a
mechanistic psychology of dialogue. Behavioral
and Brain Sciences, 27, 169-226. - Sidner, C. and Forlines, C. (2002.) Subset
Languages for Conversing with Collaborative
Interface Agents. In Proceedings of the 7th
International Conference on Spoken Language
Processing (ICSLP), Denver, CO, pp. 281-284. - Zoltan-Ford, E. (1991.) How to get people to say
and type what computers can understand.
International Journal of Man-Machine Studies,
34527-547.
48System architecture
49Study II setup
- 15 MovieLine tasks
- You want to see Fantastic Four at the Norwin
Hills theater. Find out when its showing there. - Completed in lab, via telephone (20-40mins.)
- SASSI-based user satisfaction survey
- Seven user satisfaction factors
- System response accuracy, Likeability, Cognitive
demand, Annoyance, Habitability, Speed, TTS -
50Study 1I results efficiency
- Efficiency measures no significant differences
between conditions
Simple Suggesting Requiring
Simple Suggesting Requiring
Simple Suggesting Requiring
51Study III results intrasession grammaticality
- Significant increases for whole study population
- Mean change
- ADAPTIVE 14.8
- EXPLICIT 6.05 (p0.27)
52Study 1 results efficiency
- Efficiency measures no significant differences
between ORIGINAL SIMPLE conditions - Trend towards greater efficiency for SIMPLE
- Higher task completion
- Lower median time turns on task
53Study 1 results two-pass ASR
- Two-pass ASR strategy significantly reduces
non-understandings
speech graffiti basics related work proposed
strategy evaluation summary
54Study 1 results two-pass ASR (ii)
- Lower WER for Speech Graffiti vs. expanded
speech graffiti basics related work proposed
strategy evaluation summary
55Changes to the system
- No tutorial
- Added targeted help for
- Long utterances
- Context clearing
- List navigation
- New query format
- list slot instead of what is slot?
- list theaters, list genre, etc.
speech graffiti basics related work proposed
strategy evaluation summary
56User input
Speech Graffiti-based language model
expanded-grammar-based language model
expanded-grammar-based speech recognition
hypothesis
Speech Graffiti-based speech recognition
hypothesis
selection scheme
is input Speech Graffiti?
yes
no
is input shapeable?
give SG confirmation or query result
yes
no
give shaping confirmation
give shaping help
speech graffiti basics related work proposed
strategy evaluation summary
57Study II examples
Theater Manor, genre is comedy Cinemagic Manor
Theatre, comedy What are movies? I think I heard
what are movies. Next time, it would help to
use Speech Graffiti, as in list title. Listing
2 titles For Your Consideration, The
Queen Galleria I think I heard, Galleria. Next
time, it would help to use Speech Graffiti, as in
theater is Carmike Galleria 6. Theater is
Galleria Carmike Galleria 6 Genre is drama,
whats playing? I think I heard genre is drama,
what's playing. Next time, it would help to use
Speech Graffiti, as in genre is drama, list
title. Listing 3 titles for drama, Carmike
Galleria 6 A Good Year, Babel, The
Departed. Where was I? Genre is drama, theater
is Carmike Galleria 6 List beep
Theater Manor, genre is comedy Cinemagic Manor
Theatre, comedy What are movies? Listing 2
titles For Your Consideration, The
Queen Galleria Theater is Carmike Galleria
6 Theater is Galleria Carmike Galleria 6 Genre
is drama, whats playing? Genre is drama,
listing 3 titles for drama, Carmike Galleria 6
A Good Year, Babel, The Departed. Where was
I? Genre is drama, theater is Carmike Galleria
6 List beep
SIMPLE
SUGGESTING
Theater Manor, genre is comedy Cinemagic Manor
Theatre, comedy What are movies? Please rephrase
that using Speech Graffiti. For example, list
title. Galleria Please rephrase that using
Speech Graffiti. For example, theater is Carmike
galleria 6. Theater is Galleria Carmike
Galleria 6 Genre is drama, whats
playing? Please rephrase that using Speech
Graffiti. For example, genre is drama, list
title. Where was I? Genre is comedy, theater is
Carmike Galleria 6. List beep
REQUIRING
58Study II local convergence
- REQUIRING condition generated more local
Speech Graffiti-grammatical input
Suggesting Requiring
Simple Suggesting
Requiring
speech graffiti basics related work proposed
strategy evaluation summary