Title: A Framework For Developing Conversational User Interfaces
1A Framework For Developing Conversational User
Interfaces
- James Glass, Eugene Weinstein, Scott Cyphers,
Joseph Polifroni - MIT Computer Science and Artificial Intelligence
Laboratory Cambridge, MA USA
Grace Chung Corporation for National Research
Initiatives Reston, VA USA
Mikio Nakano NTT Corporation Atsugi, Japan
2Conversational User Interfaces
Speech
Human
Computer
3Types of Conversational Interfaces
- Conversational systems differ in the degree with
which human or computer controls the conversation
(initiative)
Directed Dialogue
Free Form Dialogue
Mixed Initiative Dialogue
4Conversational Interfaces
- Can understand verbal input
- Speech recognition
- Language understanding (in context)
Language Generation
- Can engage in dialogue with a user during the
interaction
Dialogue Management
Speech Synthesis
- Can verbalize response
- Language generation
- Speech synthesis
Audio
Back End
Speech Recognition
Context Resolution
Language Understanding
5The Problem With Conversational Interfaces
- Advanced conversational systems are out there
- Both user and computer can take initiative
- Goal conversational skill of system should
approach that of human operator - But
- These systems are built by experts
- Huge learning curve for novices, and
- Tremendous iterative effort required even from
experts - For this reason
- Most advanced conversational systems remain in
research labs - e.g. Jupiter weather info system
(1-888-573-TALK) Zue et al, IEEE Trans. SAP,
8(1), 2000 - However, we have seen limited commercial
deployment - e.g. ATTs How May I Help You, Gorin et al,
Speech Communication, 23, 1997
6Simplifying Conversational System Creation
- Goal make it easier for both expert and novice
developers to create conversational interfaces - But still use advanced human language
technologies - Strategy simplify configuration process
- Automatically configure technology components
bases on examples - Allow specification through web interface or
unified configuration file
Configuration Engine
SpeechBuilder
Web Interface
Configuration File
7Configuring a Conversational Interface Knowledge
Representation
- First, define example sentences for in-domain
actions
- Then, define the important concepts present in
the actions (attributes) - Concept values make up recognizer vocabulary!
- Examples of attributes automatically matched to
attribute classes
8Starting with a Database Table
- Provide database table to configure speech
interface
- Only some columns are used to access entries
(e.g., Name) - Values of those columns become values for domain
concepts - Default action sentences are automatically
generated - But, every table cell can potentially be an
answer to a question - All Names of columns become one concept
property
9Dialogue Management
- Generic Dialogue Manager (Polifroni Chung,
ICSLP 2002)
Language Generation
Hotels
Generic Dialogue Manager
Air Travel
Dialogue Management
Speech Synthesis
Sports
Weather
Audio
Back End
- Plan system responses
- Regularize common concepts
- Summarize database results
Speech Recognition
Context Resolution
Language Understanding
10Context Resolution
Input Query
Show me restaurants in Cambridge.
Resolve Deixis
What does this one serve?
Resolve Pronouns
What is their phone number?
Inherit Predicates
Are there any on Main Street?
Incorporate Fragments
What about Massachusetts Ave?
Fill in Default Values
Give me directions from MIT.
11Human Language Technology Details
- Approach Use same technologies as deployed in
our mainstream, more complex systems - Speech Recognizer (Glass, Computer, Speech, and
Language, 2003) - Trained on 100 hours of mostly telephone speech
- Word pronunciations supplied by large dictionary,
generated by rule, or provided by developer - Natural Language Understanding (Seneff,
Computational Linguistics, 1992) - Hierarchical sentence grammar used to parse
sentence hypothesis - Back off to concept spotting when no full parse
is made - Language Generation (BaptistSeneff, ICSLP 2000)
- Used in SQL (DB Query) generation, paraphrasing
URL-encoding meaning representation, responses
12Web-based Interface
Defining Actions and Concepts (Attributes)
13Web-based Interface Viewing Sentences
Examining how sentences are reduced to an action
and a set of attribute-value pairs
14Web-based Interface Response Generation
Customizing system responses
15Web-based Interface Editing Pronunciations
Modifying system generated pronunciations for the
vocabulary
16Web-based Interface Context Resolution
Context Resolution configured through Masking and
Inheritance of concepts
17Voice Configuration File An Alternative to the
Web Interface
- Entire domain can be specified in single
configuration file - Allows for automated generation of conversational
systems
ltactionsgt ltrequest_namegt i would like a
restaurant can you (showgive) me a Chinese
restaurant in Arlington lt/actionsgt ltattributesgt
ltcuisinegt ChineseTaiwanese ltcitygt
Washington Boston Arlington lt/attributesgt lt
discoursegt name masks(city cuisine
neighborhood) lt/discoursegt ltconstraintsgt ltreques
t_namegt (cityneighborhood) prompt_for_city lt/c
onstraintsgt
18Deployment
- SpeechBuilder functional for the past three years
- Some example domains
- Office appliance control
- Laboratory directory (auto-attendant)
- Restaurant query system
- Has been used by MIT researchers (experts) as
well as novice developers at our sponsor
companies - Used in technology transfer workshop for
pervasive computing project (Oxygen) - SpeechBuilder has been used as an educational
tool - Computational linguistics class at Georgetown
University - Summer class at Johns Hopkins University
- Youngest SpeechBuilder developer 9 years old
19Japanese SpeechBuilder
- Created in collaboration with NTT
- Challenge Segmentation (no spaces between words)
20Example Domain
- A hotel application using the generic dialogue
manager - Compiled via SpeechBuilder using constraints
shown previously - Other generic functionality is automatically
included - Illustrated technical issues
- Soliciting necessary information from user
- Interpreting fragments correctly in context
- Canonicalizing relative dates
- Ordering and summarizing results of query to
content provider
- Resolving superlatives/updating discourse context
- Interpreting pronouns in context
- Returning and speaking specific properties
- Repeating previous replies
21Another Example Domain Object Manipulation System
- Stock SpeechBuilder domain for spoken dialogue
- Custom back-end connected to stereo camera and
person tracking algorithm (Demirdjian, WOMOT 2003)
22Ongoing and Future Work
- Incorporate speech synthesis
- Allow use of concatenative speech synthesizer (Yi
et al, ICSLP 2000) in SpeechBuilder - Allow use of multiple modalities
- Provide functionality to incorporate multimodal
input into systems - Improve dialogue management tools and modules
- Improve ability of SpeechBuilder systems to use
more sophisticated dialogue strategies - Provide additional generic semantic concepts for
use in domains - Allow system refinement by unsupervised learning
- Use confidence scores to improve domain language
model (NakanoHazen, Eurospeech 2003) - Allow system modification in real-time
- Need ability to re-train recognizer during
runtime (Schalkwyk et al, Eurospeech 2003)
23Thank You! For more information
- http//www.sls.csail.mit.edu/
- Email us! ecoder_at_mit.edu
- Jupiter weather Information system
- 1-617-258-0300 (outside USA)
- 1-888-573-TALK (USA toll-free)
- Mercury flight information system
- 1-617-258-6040 (outside USA)
- 1-877-MIT-TALK (USA toll-free)
- Pegasus flight status system
- 1-617-258-0301 (outside USA)
- 1-877-LCS-TALK (USA toll-free)
24THE END
25- Utility for rapid prototyping of speech-based
interfaces - Used to create demonstrations for NTT CS Labs
open house - Prototypes were developed with a few days of
effort - Three papers submitted for publishing
26Human Language Technologies
- Only some columns are used to access entries
(e.g., Name) - Values of those columns become values for domain
concepts - Default action sentences are automatically
generated - But, every table cell can potentially be an
answer to a question - Names of non-access columns become a concept
27To Configure Response Generation
- For each concept present in the domain, define
how queries about that concept should be answered
lttelephonegt The telephone for name is phone
- Define some prompts for generic events, e.g.
welcome and goodbye
ltwelcomegt Welcome to the auto-attendant ltno_da
tagt Sorry, there was no data matching your
request.
28Conversational User Interfaces Input Side
Speech
Find me a flight to Boston on Tuesday
actionflights to_cityBoston dayTuesday
29Conversational User Interfaces Output Side
Speech
Synthesis
Delta flight, number fifty five from La Guardia
to Boston
Text
Generation
flight_num55 airlineDelta originLGA destBOS
Meaning
DB
Action
30Conversational User Interfaces The Whole Picture
Or Is It?
Speech
Speech
Synthesis
Text
Generation
Meaning
Action
31The Missing Pieces Context and Dialogue
32Conversational User Interfaces The Whole Picture
Speech
Speech
Synthesis
Text
Understanding
Generation
Meaning
Meaning
Context Resolution, Dialogue Management
Action
33The Problem With Conversational Interfaces
- Complex conversational systems are out there
- Both user and computer can take initiative
- Goal conversational skill of system should
approach that of human operator - But
- These systems are built by experts
- Huge learning curve for novices, and
- Tremendous iterative effort required even from
experts - For this reason
- Most advanced conversational systems remain in
research labs - e.g. Jupiter weather info system
(1-888-573-TALK) Zue et al, IEEE Trans. SAP,
8(1), 2000 - However, we have seen limited commercial
deployment - e.g. ATTs How May I Help You, Gorin et al,
Speech Communication, 23, 1997
34Configuring Response Generation
- For each concept present in the domain, define
how queries about that concept should be answered - Configure some generic prompts for summarizing
long results - Define some prompts for generic events, e.g.
welcome
35Configuring Context Resolution
- Context Resolution (discourse) configured through
Masking and Inhertiance of concepts - Inheritance configures how actions remember
concepts, e.g. - User What is the phone number for Jim Glass
- System Jim Glass phone number is 3-1640
- User What about his email address?
- System Jim Glass email address is
glass_at_mit.edu - Name concept is inherited
- Masking configures how certain concepts block
other concepts, even in the presence of
inheritance, e.g. - User Do you have any restaurants in Boston?
- System In Boston, I have the following
- User What about in Times Square?
- System In Times Square, New York, I have
- City concept is masked by Neighborhood concept
Name is inherited
City is masked
36Voice Configuration File
- Developers can also use Voice Configuration
(VCFG) file format to configure SpechBuilder
domains
ltactionsgt ltrequest_namegt i would like a
restaurant can you (showgive) me a Chinese
restaurant in Arlington lt/actionsgt ltattributesgt
ltcuisinegt ChineseTaiwanese ltcitygt
Washington Boston Arlington lt/attributesgt lt
discoursegt name masks(city cuisine
neighborhood) lt/discoursegt ltconstraintsgt ltreques
t_namegt (cityneighborhood) prompt_for_city lt/c
onstraintsgt
37Dialogue Management
- Generic Dialogue Manager (Polifroni Chung,
ICSLP 2002)
Hotels
Language Generation
Generic Dialogue Manager
Air Travel
Speech Synthesis
Sports
Dialogue Management
Weather
- Plan system responses
- Regularize common concepts
- Summarize database results
Database
Audio
Context Resolution
Speech Recognition
Language Understanding
38Deployment
- SpeechBuilder functional for the past three years
- Some example domains
- Office appliance control
- Laboratory directory (auto-attendant)
- Restaurant query system
- Has been used by MIT researchers (experts) as
well as novice developers at our partner
companies - SpeechBuilder has been used by students in
- Computational linguistics class at Georgetown
University - Summer class at Johns Hopkins University
- Technology transfer workshop for pervasive
computing project (Oxygen) - In collaboration with NTT, we have developed a
Japanese version of SpeechBuilder. Japanese
domains - Bus timetable system
- Weather information system
39Configuring a Speech Interface with
SpeechBuilder Knowledge Representation
- First define some concepts present in the domain
(attributes) - Concept values make up recognizer vocabulary!
- Then, define examples of things to do with the
concepts (actions) - Examples of attributes automatically matched to
attribute classes