Title: Research Challenges for Spoken Language Dialog Systems
1Research Challenges for Spoken Language Dialog
Systems
- Julie Baca, Ph.D.
- Center for Advanced Vehicular Systems
- Mississippi State University
- Computer Science Graduate Seminar
- November 27, 2002
2Overview
- Define dialog systems
- Describe research issues
- Present current work
- Give conclusions and discuss future work
3What is a Dialog System?
- Current commercial voice products require
adherence to command and control language,
e.g., - User Plan Route
- Such interfaces are not robust to variations from
the fixed words and phrases.
4What is a Dialog System?
- Dialog systems seek to provide a natural
conversational interaction between the user and
the computer system, e.g., - User Is there a way I can get to Canal Street
from here?
5Domains for Dialog Systems
- Travel reservation
- Weather forecasting
- In-vehicle driver assistance
- On-line learning environments
6Dialog Systems Information Flow
- Must model two-way flow of information
- User-to-system
- System-to-user
7Dialog Manager
NLP
Speech Recognition
Application Database
TTS
Response Generation
8Research Issues
- Many fundamental problems must be
- solved for these systems to mature.
- Three general areas include
- Automatic Speech Recognition (ASR)
- Natural Language Processing (NLP)
- Human-computer Interaction (HCI)
9NLP Issue for Dialog Systems Semantics
- Must assess meaning, not just syntactic
correctness. - Therefore, must handle ungrammatical inputs,
e.g., - The nearest .....station is is there a gas
station nearby?
10NLP Issue Semantic Representation 1
- For NLP, use semantic grammars
- Semantic frame with slots and fillers
- ltdestinationgt -gt ltprepgt ltplacegt
- ltprepgt-gt nearest
- ltplacegt-gt gas station
11NLP Issue Semantic Representation 2
- Must also represent
- How do I get from Canal Street to Royal Street?
- ltdirectionsgt -gt ltstartgt ltdestinationgt
- ltdestinationgt -gt ltprepgtltplacegt
- ltplacegt -gt ltstreet_namegt ltbusinessgt
- ltstreet_namegt-gt Canal St Royal St
- ltprepgt -gt ltto_prepgtltnear_prepgt
- ltnear-prepgt -gt nearestclosest
-
12NLP Issue Semantic Representation 3
- Two Approaches
- Hand-craft the grammar for the application, using
robust parsing to understand meaning 1,2. - Problem time, expense
- Use statistical approach, generating initial
rules and using annotated tree-banked data to
discover the full rule set 3,4. - Problem annotated training data
13ASR/NLP Issue Reducing Errors
- Most systems use a loose coupling of ASR and NLP.
- Try earlier integration of semantics with
recognizer. - Incorporate dialog state into underlying
statistical model. - Problems
- Increases search space
- Training Data
14NLP Issue Resolving Meaning Using Context
- Must maintain knowledge of the conversational
context. - After request for nearest gas station, user says,
What is it close to? - Resolving it - anaphora
- Another follow-up by the user,
- How about restaurant?
- Resolving with nearest- ellipsis
15Resolving Meaning Discourse Analysis
- To resolve such requests, system must track
context of the conversation. - This is typically handled by a discourse analysis
component in the Dialog Manager.
16Dialog Manager Discourse Analysis
- Anaphora resolution approach Use focus
mechanism, assuming conversation has focus 5. - For our example, gas station is current focus.
- But how about
- Im at Food Max. How do I get to a gas station
close to it and a video store close to it? - Problem Resolving the two its.
17Dialog System
Discourse Analysis
NLP
Speech Recognition
Dialog Manager
Application Database
Response Generation
TTS
18Dialog Manager Clarification
- Often cannot satisfy request in one iteration.
- The previous example may require clarification
from the user, - Do you want to go to the gas station first?
19HCI IssueSystem vs. User Initiative
- What level of control do you provide user in the
conversation?
Initiative
Computer
Human
C "Please say departure city"
U"Tell me how to get to the Hilton."
20Mixed Initiative
- Total system initiative provides low usability.
- Total user initiative introduces higher error
rate. - Thus, mixed initiative approach, balancing
usability and error rate, is taken most often. - Allowing user to adapt the level explicitly has
also shown merit 6.
21ASR/HCI IssueError Handling
- How to handle possible errors?
- Assign confidence score to result of recognizer.
- For results with lower confidence score, request
clarification or revert to system-oriented
initiative. - Can incorporate dialog state in computing
confidence score 7.
22HCI Issue Response Generation
- How to present response to user in a way that
minimizes cognitive load? - Varies depending on whether output is speech-only
or speech /visual. - Speech-only output must respect user short-term
memory limitations, e.g., lists must be short,
timed appropriately, and allow repetition. - Speech/visual output must be complimentary, e.g.,
importance of redundancy and timing. -
23HCI Issue Evaluating Dialog Systems
- How to compare and evaluate dialog systems?
- PARADISE
- (Paradigm for Dialog Systems Evaluation)
provides a standard framework 8.
24PARADISE Evaluating Dialog Systems
- Task success
- Was the necessary information exchanged?
- Efficiency/Cost
- Number dialog turns, task completion time
- Qualitative
- ASR rejections, timeouts, helps
- Usability
- User satisfaction with ASR, task ease,
interaction pace, system response
25Current Work
- Sponsored by CAVS
- Examining
- In-vehicle Environment
- Manufacturing Environment
- Multidisciplinary Team
- CS , ECE, IE
- Baca, Picone, Duffy
- ECE graduate students
- Hualin Gao, Zheng Feng
-
26Current Work In-vehicle Dialog System
- Specific ASR Issues for In-vehicle Environment
- Real-time performance
- Noise cancellation
27Current Work In-vehicle Dialog System
- Other Significant Issues
- Reducing error rate
- Graceful error handling and mixed initiative
strategy - Response generation to reduce user cognitive load
- Evaluation
28Current Work In-vehicle Dialog System
- Approach
- Develop prototype in-vehicle system
- Initial focus on ASR and NLP issues
- Integrate real-time recognizer 9
- Employ noise-cancellation techniques 10
- Use semantic grammar for NLP
- Examine tighter integration of ASR and NLP
- Incorporate dialog state in underlying
statistical models for ASR
29Current Work In-vehicle Dialog System
- Second phase, focus on
- Response generation
- Mixed initiative strategies
- Evaluation
30Current Work Workforce Training Dialog System
- Significant issues in manufacturing environment
- Recognition issues
- Real-time performance
- Noisy environments
- Understanding issues
- Multimodal interface for reducing error rate,
e.g., voice and pen 11. - HCI/Human Factors Issues
- Response generation to integrate speech and
visual output
31Research Significance
- Advance the development of dialog systems
technology through addressing fundamental issues
as they arise in the automotive domains. - Potential areas ASR, NLP, HCI
32References
- 1 S.J. Young and C.E. Proctor, The design and
implementation of dialogue control in voice
operated database inquiry systems, Computer
Speech and Language, Vol.3, no. 4, pp. 329-353,
1992. - 2 W. Ward, Understanding spontaneous speech,
in Proceedings of International Conference on
Acoustics, Speech and Signal Processing, Toronto,
Canada, 1991, pp. 365-368. - 3 R. Pieraccini and E. Levin, Stochastic
representation of semantic structure for speech
understanding, Speech Communication, vol. 11.,
no.2, pp. 283-288, 1992. - 4 Y. Wang and A. Acero, Evaluation of spoken
grammar learning in the atis domain, in
Proceedings International Conference on
Acoustics, Speech, and Signal Processing,
Orlando, Florida, 2002. - 5 C. Sidner, Focusing in the comprehension of
definite anaphora, in Computational Model of
Discourse, M. Brady, Berwick, R., eds, 1983,
Cambridge, MA, pp. 267-330, The MIT Press. - 6 D. Littman and S. Pan, Empirically
evaluating an adaptable spoken language dialog
system, in The Proceedings of International
Conference on User Modeling, UM 99, Banff,
Canada, 1999. - 7 S. Pradham and W. Ward, Estimating Semantic
Confidence for Spoken Dialogue Systems,
Proceedings of the IEEE International Conference
on Acoustics, Speech, and Signal Processijng
(ICASSP-2002), Orlando, Florida, USA, May 2002.
33References
- 8 M. Walker, et al., PARADISE A Framework for
Evaluating Spoken Dialogue Agents, Proceedings
of the 35th Annual Meeting of the Association for
Computational Linguistics (ACL-97), pp. 271-289,
1997. - 9 F. Zheng, J. Hamaker, F. Goodman, B. George,
N. Parihar, and J. Picone, - The ISIP 2001 NRL Evaluation for
Recognition of Speech in Noisy Environments,
presented at the Speech In Noisy Environments
(SPINE) Workshop, Orlando, Florida, USA, November
2001. - 10 F. Zheng and J. Picone, "Robust Low
Perplexity Voice Interfaces, MITRE
Corporation, December 31, 2001. - 11 S. Oviatt, Taming Speech Recognition Errors
within a Multimodal Interface, Communications
of the ACM, Sept. 2000, 43 (9), 45-51 (special
issue on "Conversational Interfaces").