Modeling%20the%20Cost%20of%20Misunderstandings - PowerPoint PPT Presentation

About This Presentation
Title:

Modeling%20the%20Cost%20of%20Misunderstandings

Description:

We present a data-driven approach which allows ... point between these errors, and fine-tune the confidence annotator accordingly. ... Fine-tuning the annotator ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 3
Provided by: danb7
Category:

less

Transcript and Presenter's Notes

Title: Modeling%20the%20Cost%20of%20Misunderstandings


1
Modeling the Cost of Misunderstandings in the CMU
Communicator System Dan Bohus Alex
Rudnicky School of Computer Science, Carnegie
Mellon University, Pittsburgh, PA, 15213
ø.
Abstract We present a data-driven approach
which allows us to quantitatively assess the
costs of various types of errors that a
confidence annotator commits in the CMU
Communicator spoken dialog system. Knowing these
costs we can determine the optimal tradeoff point
between these errors, and fine-tune the
confidence annotator accordingly. The cost models
based on net concept transfer efficiency fit our
data quite well, and the relative costs of
false-positives and false-negatives are in
accordance with our intuitions. We also find,
surprisingly that for a mixed-initiative system
such as the CMU Communicator, these errors
trade-off equally over a wide operating range.
1.
2.
  • Motivation. Problem Formulation.
  • Intro
  • In previous work 1, we have cast the problem of
    utterance-level confidence annotation as a binary
    classification task, and have trained multiple
    classifiers for this purpose
  • Training corpus 131 dialogs, 4550 utterances
  • 12 Features from recognition, parsing and dialog
    level
  • 7 Classifiers Decision Tree, ANN, Bayesian Net,
    AdaBoost, Naïve Bayes, SVM, Logistic regression.
  • Results (mean classification error rates in
    10-fold cross-validation)
  • Most of the classifiers obtained statistically
    indistinguishable results (with the notable
    exception of Naïve Bayes). The logistic
    regression model obtained much better
    performance on a soft-metric
  • Cost Models The Approach
  • The Approach
  • To model the impact of FPs and FNs on the system
    performance, we
  • Identify a suitable dialog performance metric (P)
    which we want to optimize for
  • Build a statistical regression model on whole
    sessions using P as the response variable and the
    counts of FPs and FNs as predictors
  • P f(FPs, FNs)
  • P kCostFP FPCostFNFN (Linear Regression)
  • Performance metrics
  • User satisfaction (5-point scale) subjective,
    hard to obtain
  • Completion (binary) too coarse
  • Concept transmission efficiency
  • CTC correctly transferred concepts/turn
  • ITC incorrectly transferred concepts/turn
  • REC relevantly expressed concepts/turn
  • The Dataset

Random baseline 32
Previous Garble Baseline 25
Classifiers 16
2
So the problem translates to locating a point on
the operating characteristic (by moving the
classification threshold) which minimizes the
total cost (and thus implicitly maximize the
chosen performance metric), rather than the
classification error rate. The cost, according to
model 3 is Cost 0.48 FPNC 2.12 FPC
1.33 FN 0.56 TN
3.
  • Cost Models The Results
  • Cost Models Targeting Efficiency
  • 3 successively refined cost models were developed
    targeting efficiency as a response variable.
  • The goodness of fit for this models (indicated by
    R2), both on the training and in a 10-fold
    cross-validation process are illustrated in the
    table below.
  • Model 1 CTC FP FN TN k
  • Model 2 CTCITC REC FP FN TN k
  • added the ITC term so that we also minimize the
    number of incorrectly transferred concepts.
  • REC captures a prior on the verbosity of the user
  • both changes further improve performance
  • Model 3 CTCITC REC FPC FPNC FN TN k
  • The FP term was split into 2, since there are 2
    different types of false positives in the system,
    which intuitively should have very different
    costs
  • FPC false positives with relevant concepts
  • FPNC false positives without relevant concepts

The fact that the cost function is almost
constant across a wide range of thresholds,
indicates that the efficiency of the dialog stays
about the same, regardless of the ratios of FPs
and FNs that the system makes.
  • Further Analysis
  • Is CPT-IPT an Adequate Metric ?
  • Mean 0.71 Standard Deviation 0.28,
  • Mean for Completed dialogs 0.82,
  • Mean for Uncompleted dialogs 0.57,
  • differences are statistically significant at a
    very high level of confidence (p 7.23 10-9)
  • Can We Reliably Extrapolate the Model to Other
    Areas of the ROC ?
  • The distribution of FPs and FNs across dialogs
    indicates that, although the data is obtained
    with the confidence annotator running with a
    threshold of 0.5, we have enough samples to
    reliably estimate the other areas of the ROC.

Model R2 all R2 train R2 test
CTCFPFNTN 0.81 0.81 0.73
CTC-ITCFPFNTN CTC-ITCRECFPFNTN 0.860.89 0.860.89 0.780.83
CTC-ITC RECFPCFPNCFNTN 0.94 0.94 0.90
k 0.41
CREC 0.62
CFPNC -0.48
CFPC -2.12
CFN -1.33
CTN -0.55
4.
  • Fine-tuning the annotator
  • We want to find the optimal trade-off point on
    the operating characteristic of the classifier.
    Implicitly we are minimizing classification error
    rate (FP FN).
  • Conclusions
  • Proposed a data-driven approach to quantitatively
    assess the costs of various types of errors
    committed by a confidence annotator.
  • Models based on efficiency fit the data well
    obtained costs confirm the intuition.
  • For CMU Communicator, the models predict that the
    total cost stays the same across a large range of
    the operating characteristic of the confidence
    annotator.

School of Computer Science, Carnegie Mellon
University, 2001, Pittsburgh, PA, 15213.
Write a Comment
User Comments (0)
About PowerShow.com