Title: Using Model Trees
1Using Model Trees For Evaluating Dialog Error
Conditions Based on Acoustic Information
Goal
- Use model trees for evaluating user utterances
for response to system error. - Input acoustic features from users speech
signal. - Output a measure representing user activation.
- Develop an online, objective, human-centered
evaluation metric for spoken dialog systems.
Abe Kazemzadeh, Sungbok Lee, and Shrikanth
Narayanan Computer Science, Electrical
Engineering, and Linguistics SAIL Lab _at_ Viterbi
School of Engineering University of Southern
California
Motivation
- Errors are a prevalent phenomenon in spoken
dialog systems. - Evaluate and optimize of dialog systems.
- Obtain feedback from user behavior.
- Synthesize low-level features into one,
real-valued measurement of a users activation.
Results
Histograms of the model tree output for the whole
corpus (histogram 1), for error responses
(histogram 2), and for non-error responses
(histogram 3). Lower left plot shows the
precision and recall.
Data
- Communicator Travel Planning Systems, June 2000
recordings. - Annotated to describe the way that users become
aware of and react to errors. - 141 dialogs, 2586 utterances.
Model Trees
- Machine learning technique, similar to decision
trees and model trees. - Outputs a continuous, real-valued number based on
a linear regression model for each leaf node.
Best correlation with user surveys occurred when
model tree output sums were normalized for dialog
length and when only the highest 30 were
considered.
Methodology
Evaluation Metric Correlation With Correlation With
Evaluation Metric Tag Data ModelTree Output
It was easy to get info I wanted .412 .389
I found it easy to understand what the sys. Said .035 .092
I knew what I could say or do at each point in the dial. .269 .311
The system worked the way I expected it to .365 .498
I would like to use the system regularly .332 .409
- Train by using annotated data if there is an
error response, set model tree target to 1, else,
0.
Conclusion
- Overall ability to pick out error responses is
65 precision, 63 recall. - The model tree approach allows for a threshold
that can shift preferents toward precision or
recall. - Correlation between model tree analysis and
survey results was moderate. - Different questions showed different levels of
correlation. - Model tree output can be interpreted as an
indicator of user state and can show a dialog
activation landscape which can be used in user
emotion tracking, e.g., to identify dialog
hotspots. - Future work will aim to further this study by
- Testing other methods of synthesizing lower level
features, in particular, Bayesian networks - Examining other corpora. Currently analyzing All
My Sons radio play.