Modeling%20the%20Cost%20of%20Misunderstandings

About This Presentation

Title:

Modeling%20the%20Cost%20of%20Misunderstandings

Description:

We present a data-driven approach which allows ... point between these errors, and fine-tune the confidence annotator accordingly. ... Fine-tuning the annotator ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 3

Provided by: danb7

Category:

more less

Transcript and Presenter's Notes

Title: Modeling%20the%20Cost%20of%20Misunderstandings

1
Modeling the Cost of Misunderstandings in the CMU
Communicator System Dan Bohus Alex
Rudnicky School of Computer Science, Carnegie
Mellon University, Pittsburgh, PA, 15213
ø.
Abstract We present a data-driven approach
which allows us to quantitatively assess the
costs of various types of errors that a
confidence annotator commits in the CMU
Communicator spoken dialog system. Knowing these
costs we can determine the optimal tradeoff point
between these errors, and fine-tune the
confidence annotator accordingly. The cost models
based on net concept transfer efficiency fit our
data quite well, and the relative costs of
false-positives and false-negatives are in
accordance with our intuitions. We also find,
surprisingly that for a mixed-initiative system
such as the CMU Communicator, these errors
trade-off equally over a wide operating range.
1.
2.

Motivation. Problem Formulation.
Intro
In previous work 1, we have cast the problem of
utterance-level confidence annotation as a binary
classification task, and have trained multiple
classifiers for this purpose
Training corpus 131 dialogs, 4550 utterances
12 Features from recognition, parsing and dialog
level
7 Classifiers Decision Tree, ANN, Bayesian Net,
AdaBoost, Naïve Bayes, SVM, Logistic regression.
Results (mean classification error rates in
10-fold cross-validation)
Most of the classifiers obtained statistically
indistinguishable results (with the notable
exception of Naïve Bayes). The logistic
regression model obtained much better
performance on a soft-metric

Cost Models The Approach
The Approach
To model the impact of FPs and FNs on the system
performance, we
Identify a suitable dialog performance metric (P)
which we want to optimize for
Build a statistical regression model on whole
sessions using P as the response variable and the
counts of FPs and FNs as predictors
P f(FPs, FNs)
P kCostFP FPCostFNFN (Linear Regression)
Performance metrics
User satisfaction (5-point scale) subjective,
hard to obtain
Completion (binary) too coarse
Concept transmission efficiency
CTC correctly transferred concepts/turn
ITC incorrectly transferred concepts/turn
REC relevantly expressed concepts/turn
The Dataset

Random baseline 32
Previous Garble Baseline 25
Classifiers 16
2
So the problem translates to locating a point on
the operating characteristic (by moving the
classification threshold) which minimizes the
total cost (and thus implicitly maximize the
chosen performance metric), rather than the
classification error rate. The cost, according to
model 3 is Cost 0.48 FPNC 2.12 FPC
1.33 FN 0.56 TN
3.

Cost Models The Results
Cost Models Targeting Efficiency
3 successively refined cost models were developed
targeting efficiency as a response variable.
The goodness of fit for this models (indicated by
R2), both on the training and in a 10-fold
cross-validation process are illustrated in the
table below.
Model 1 CTC FP FN TN k
Model 2 CTCITC REC FP FN TN k
added the ITC term so that we also minimize the
number of incorrectly transferred concepts.
REC captures a prior on the verbosity of the user
both changes further improve performance
Model 3 CTCITC REC FPC FPNC FN TN k
The FP term was split into 2, since there are 2
different types of false positives in the system,
which intuitively should have very different
costs
FPC false positives with relevant concepts
FPNC false positives without relevant concepts

The fact that the cost function is almost
constant across a wide range of thresholds,
indicates that the efficiency of the dialog stays
about the same, regardless of the ratios of FPs
and FNs that the system makes.

Further Analysis
Is CPT-IPT an Adequate Metric ?
Mean 0.71 Standard Deviation 0.28,
Mean for Completed dialogs 0.82,
Mean for Uncompleted dialogs 0.57,
differences are statistically significant at a
very high level of confidence (p 7.23 10-9)
Can We Reliably Extrapolate the Model to Other
Areas of the ROC ?
The distribution of FPs and FNs across dialogs
indicates that, although the data is obtained
with the confidence annotator running with a
threshold of 0.5, we have enough samples to
reliably estimate the other areas of the ROC.

Model R2 all R2 train R2 test
CTCFPFNTN 0.81 0.81 0.73
CTC-ITCFPFNTN CTC-ITCRECFPFNTN 0.860.89 0.860.89 0.780.83
CTC-ITC RECFPCFPNCFNTN 0.94 0.94 0.90
k 0.41
CREC 0.62
CFPNC -0.48
CFPC -2.12
CFN -1.33
CTN -0.55
4.

Fine-tuning the annotator
We want to find the optimal trade-off point on
the operating characteristic of the classifier.
Implicitly we are minimizing classification error
rate (FP FN).

Conclusions
Proposed a data-driven approach to quantitatively
assess the costs of various types of errors
committed by a confidence annotator.
Models based on efficiency fit the data well
obtained costs confirm the intuition.
For CMU Communicator, the models predict that the
total cost stays the same across a large range of
the operating characteristic of the confidence
annotator.

School of Computer Science, Carnegie Mellon
University, 2001, Pittsburgh, PA, 15213.

Write a Comment

User Comments (0)

About PowerShow.com

Modeling%20the%20Cost%20of%20Misunderstandings - PowerPoint PPT Presentation

Modeling%20the%20Cost%20of%20Misunderstandings

We present a data-driven approach which allows ... point between these errors, and fine-tune the confidence annotator accordingly. ... Fine-tuning the annotator ... – PowerPoint PPT presentation