Title: Automatic Decision Detection in Conversational Speech
1Automatic Decision Detection in Conversational
Speech
- Pei-yun (Sabrina) Hsueh
- Johanna Moore
- Human Communication Research Centre
- University of Edinburgh
- April, 2007
- p.hsueh_at_ed.ac.uk
2Outline
- Introduction
- Why we think we need decision detection?
- What is automatic decision detection in our
definition? - Corpus and annotations
- Experiments
- Empirical findings
- Conclusion and future work
3Why Automatic Decision Detection?
- Advance of recording and storage technology has
enabled the archiving of meeting conversations. - However, it is still difficult to find
information from these often-lengthy archives.
4Why Automatic Decision Detection?
- Decisions as an essential outcome of
meetings (Pallota et al., 2005 Rienks et al.,
2005). - Reviewing decisions is essential to the re-use of
meeting recordings (Whittaker et al., 2005). - Use case
- missed a meeting
- be assigned to a new project
- prepare a report to upper management
5Meeting browser with decision-related information
highlighted
TOPIC SEGMENTS
6Automatic Decision Detection
- (1) Identifying the subset of dialogue acts that
contain decision-related information - Classify decision-related dialogue acts which
support or reflect the decisions
7 The group is making decisions on how to find
the remote when it is misplaced in the following
dialog
(1) Identifying the subset of dialogue acts that
contain decision-related information
- (1) C how many times do you really, seriously
lose your remote control ? - (2) C and would a device like that help you to
find it ? - (3) B There might be something that you can do
in the circuit board and the chip to make it make
a noise or something , - (4) B but it would take a lot more development
than we have. - (5) A Mm-hmm .
- (6) A Okay , that's a fair evaluation .
- (7) A Um we've decided not to worry about that
for now . - (8) A Okay
Decision-related Dialogue Act
8Automatic Decision Detection
- (1) Identifying the subset of dialogue acts that
contain decision-related information - Classify decision-related dialogue acts which
support or reflect the decisions - (2) Identifying the subset of topic segments
that contain decisions - Classify decision-related topic segments which
contain one ore more decision-related DAs - Recognize decision-related DAs in a wider window
- Provide info on what the decisions are about
9(2) Identifying the subset of topic segments that
contain decisions
In a produce detail design meeting,
- opening
- presentation of prototype(s)
- evaluation of prototype(s) (3)
- evaluation how to find when misplaced (1)
- evaluation preferred prototype (2)
- evaluation extent of achievement of targets
- costing (1)
- evaluation of project
- ideas for further development
- evaluation of project
Decision-related Topic Segment
10 Multiparty Dialogue Processing
- Different from spoken language research in read
speech and two-party dialogue - More modalities used in face-to-face
communication (e.g., gesture, head, eye contact,
body language) - More mediums (e.g., presentation, whiteboard,
note) - Group-level interactions
- Communication models have been proposed.
- Multimodal Discourse Ontology (MMDO) (Nierkrasz
et al., 2005) argumentation acts
(Marchand-mailet, 2003) argumentation diagram
(Rienks et al., 2005) - But research in computational modelling of
multiparty dialogue is still in its infancy.
11Related Work Detecting opinionated speech in
conversational speech
- Detect hot spots (Wrede and Shriberg, 2003),
group-level interest (Gatica-Perez, 2005),
agreement/disagreement (Galley, 2004 Hillard
2003), action items (Purver et al., 06) - Commonly casting the detection task as a
classification task
12Automatic Decision Detection as a Binary
Classification Task
This unit contains decision-related information
or not
13Corpora Used
- AMI meetings
- Four different meeting types (Scenario-driven)
- kick-off, conceptual design, detail design,
wrap-up - Four different speaker roles
- PM, ME, IE, ID
- Rich annotations from the corpus
- manual transcription
- speaker intention (DA class, e.g, Inform,
Suggest, Elicit-Assessment, Elicit-Suggestion) - Minimal units dialogue acts (DAs)
- Average length 26 minutes (800 DAs)
- topic segmentation and labelling
- abstractive summary (with focus on general
discussion, decision, problem, action item) - extractive summary
14Decision-related DA Annotation Three steps
Step 1
Annotator Group A
Produce abstractive summary
Annotators are working individually
15Produce abstractive summaries with a focus on
decisions
Transcripts
Step 1
Abstractive summary
Sentence
16Decision-related DA Annotation Three steps
- 5 decisions made per meeting (stddev 3)
Step 1
Step 2
Annotator Group B
Annotator Group A
Produce abstractive summary
Produce extractive summary
Annotators are working individually
17Transcripts
Transcripts
Step 1
Step 2
Abstractive summary
Extractive summary
Minimal unit Sentence
Minimal unit Dialogue act
18Decision-related DA Annotation Three steps
- 5 decisions made per meeting (stddev 3)
- 11 of the dialogue acts in extractive summaries
Step 1
Step 2
Step 3
Annotator Group B
Annotator Group A
Produce abstractive summary
Produce extractive summary
Specify decision links
Annotators are working individually
19Transcripts
Transcripts
Step 1
Step 2
Step 3
Abstractive summary
Extractive summary
Decision Link
Minimal unit Sentence
Minimal unit Dialogue act
Annotating Decision-related Dialogue Acts
20Decision-related DA Annotation Three steps
- 5 decisions made per meeting (stddev 3)
- 11 of the dialogue acts in extractive summaries
- 2 decision links specified per decision (stddev
2)
Step 1
Step 2
Step 3
Annotator Group B
Annotator Group A
Produce abstractive summary
Produce extractive summary
Specify decision links
Annotators are working individually
21EXP1 Detecting Decision-Related DAs
- Experiment data
- 50 meetings selected from 19 series (37,400 DAs)
- 19x476 distinctive speakers
- imbalanced class distribution
- 12.70 (554 out of all 4,362 DAs in extractive
sum.) - Perform 5-fold cross validation
- Using a MaxEnt classifier
- Empirical study shows some systematic difference
in features of decision-related DAs - cue words, topic classes
- prosody, contexts (speaker intention and role)
22EXP1 (Baseline) Using Only Automatic Features
- Prosodic features (Murray et al., 2006)
- Duration
- Pause
- Speech rate
- Energy (mean, variation)
- Pitch (mean, variation, slope, min and max after
linearisation) (Shriberg and Stolcke, 2001)
23EXP1 Using Annotated Features
- Lexical features (LX1)
- Uni-gram vectors from manual transcription
- Contextual features (CTXT)
- DA class (current, previous, following)
- Speaker role, meeting type
- Topic features (TOPIC)
- Topic label classes (e.g., agenda, costing,
evaluation of prototype, trend watching) - Position to the last topic shift
24EXP1 Feature effects on detecting
decision-related DAs
- When used alone, LX1 features yield the most
competitive model in terms of both precision and
recall.
25EXP1 Feature effects on detecting
decision-related DAs
- When used alone, LX1 features yield the most
competitive model in terms of both precision and
recall. - Further combining TOPIC and PROS features
(ALL-CTXT) improves on both precision and recall.
ALL LX1PROS CTXTTOPIC
26EXP1 Feature effects on detecting
decision-related DAs
- When used alone, LX1 features yield the most
competitive model in terms of both precision and
recall. - Further combining TOPIC and PROS features
(ALL-CTXT) improves on both precision and recall.
- Including CTXT features improves precision but
degrades recall.
27EXP1 Feature effects on detecting
decision-related DAs
- Lenient match recognition in a wider window of
10 seconds - if a prediction occurs preceding or following 10
seconds of the actual decision-related DA, it is
a match. - Results suggest the task of decision-related DA
recognition is inherently a fuzzy one.
28EXP2 Detecting Decision-Related Topic Segments
- Decision-related topic segments
- Segments that contain one or more hypothesized
decision-related DAs. - Recognition in an even wider window (55DAs)
(75secs) - Expect by chance 31.78 (198 out of 623 topic
segments).
29EXP2 Feature effects on detecting
decision-related topic segments
- When used alone, LX1 features still yield the
best recall, but TOPIC features yield the best
precision.
30EXP2 Feature effects on detecting
decision-related topic segments
- When used alone, LX1 features still yield the
best recall, but TOPIC features yield the best
precision. - Results are similar to EXP1
- ALL model yields the best precision
- ALL-CTXT yields the best recall
31Identify features that can characterize
decision-related DAs
Feature Analysis
- Lexical features (cues for decision-related DA)
- Content words (e.g. advanced chip)
- Proper nouns (e.g., we gtgt you, I)
- Fewer negative expressions (e.g., I don't know, I
don't think) - Topical features (topic cues)
- Topic classes (e.g., costing, budget) have a
significantly better chance of containing
decision-related information. - Functional segments (e.g., opening, closing,
agenda/equipment issues, chitchat) usually do not
contain decision-related information.
32Identify features that can characterize
decision-related DAs
Feature Analysis
- Contextual features
- Speaker role
- PM dominates (Z-test plt0.001)
- Speaker intention Current dialogue act
- Inform,Suggest,Elicit-assessment,Offer,Elicit-info
rm - As opposed to Stall, Fragment, Back-channel,
Be-negative - Address to all gtgt Address to one or two
- Prosodic features
- Speakers tend to pause preceding and following
the decision-related dialogue acts
33Research Questions Addressed
- Is it possible to derive implicit semantic
information such as decisions from meeting
recordings? - (1) Given all available features, is it possible
to classify dialogue acts and topic segments that
contain decision-related information? -
- (2) What features of decision-related DAs and
topic segments actually exhibit demonstrable
difference?
34Conclusion
- (1) Integrating multiple knowledge sources is
essential to automatic decision detection. - The baseline using only automatically generated
features (PROS) does not yield competitive
models. - Combining LX1 features and lexically derivable
TOPIC features with PROS features (ALL-CTXT)
yields competitive models on both tasks. - Detecting decision-related DAs (F1 0.64/0.69)
- Detecting decision-related topic segments (0.83).
35Conclusion
- (1) Integrating multiple knowledge sources is
essential to automatic decision detection. - The baseline using only automatically generated
features (PROS) does not yield competitive
models. - Combining LX1 features and lexically derivable
TOPIC features with PROS features (ALL-CTXT)
yields competitive models on both tasks. - Further combining CTXT features with all the
other available features (ALL) can further
improve the precision on both tasks. - Detecting decision-related DAs (PR 0.72/0.76)
- Detecting decision-related topic segments (0.86).
36Conclusion
- (2) Decision-related DAs do exhibit demonstrable
differences in various features. - Feature selection will benefit the decision
models by selecting a subset of characteristic
features. - This will have impacts on how we automatically
generate the features that are currently manually
annotated (e.g., manual transcripts, topic
labels, DA classes).
37Conclusion
- The first step towards integrating multimodal
information to derive implicit semantics from
multiparty dialogue recordings. - Summarization
- Information retrieval and extraction
- Data mining (relationship discovery)
- Organizational memory
38Future work
- Automatic decision discussion segmentation
- Automatic detection of the decision-related
functional role of DAs - Initiate, Refine, Support, Rebut, Confirm
- Using automated features
- Automatic recognized words (ASR WER 30)
- Automatic topic segmentation (Hsueh et al., EACL
2006 Hsueh and Moore, ACL 2007) and topic
labelling (Hsueh and Moore, SLT 2006) - Automatic dialogue act classification (Dielmann
et al., 2007)
39Meeting browser with decision-related information
highlighted
DECISION SUMMARY
40Questions?p.hsueh_at_ed.ac.uk
- This work is supported by the AMI project.
http//www.amiproject.org - Our special thanks to the three anonymous
reviewers. - We also thank our project members in University
of Edinburgh and research partners in TNO, Univ.
of Twente, DFKI and IDIAP for valuable comments
and help.
41Backup
42Related Work Meeting corpora
- CMU (Waibe et al., 2001)
- LDC (Cieri et al., 2002)
- NIST (Garofolo et al., 2004)
- ICSI (Janin et al., 2003)
- IM2/M4 project (Marchand-Mailet, 2003)
- CALO (2005) Y2 Scenario data
- AMI (Carletta et al., 2005)
- Average length 26 minutes
- (800 dialogue acts)
43Challenges Faced
- Mainstream SLU techniques are not sufficient.
- Social interactions are inherently multimodal.
- Jaimes et al., 2007 Schroeder and Foxe, 2005
- Violate assumptions in other speech genres.
- Difficult to automatically extract the
descriptive argument structure. - Multimodal Discourse Ontology (MMD) (Nierkrasz,
2005) argumentationa acts (Rienks 2005
Marchand, 2003)
44Limitation
- Have the annotations captured all the
decision-related dialogue acts? - percentage agreement 60
- Training set size
45EXP1 Automatic Features Used (Baseline)
- Prosodic features (Shriberg and Stolcke, 2001
Murray et al., 2006) - Duration
- of words spoken in a DA, amount of time passed
- Pause
- Pause preceding and following a DA
- Speech rate
- of words spoken per second in a DA
- of syllables per second
- Energy
- average energy level and variance in each quarter
of a DA - Pitch
- Contour Pitch slope and variance at multiple
points of a DA (e.g., 100ms, 200ms, 1st quarter) - Maximum and Minimum F0 (5-95) linearisation
46Former research has addressed the problem of
topic segmentation and labeling (Hsueh et al.,
EACL 2006 Hsueh and Moore, ACL 2007)
- opening
- presentation of prototype(s)
- evaluation of prototype(s)
- how to find when misplaced
- preferred prototype
- extent of achievement of targets
- costing
- evaluation of project
- ideas for further development
- evaluation of project
47Challenge
- It is a new problem in the field.
- Not clear whether mainstream SLU and
summarisation techniques are applicable. - Face-to-face spontaneous dialogues violate
assumptions made - Not a typical speech summarisation task
- Sentiment analysis has not been attempted in
detecting opinionated speech.
48Questions to answer
- (1) What are the features that can be used to
characterize DM dialogue acts? - gt Empirical analysis
- (2) Given all the potentially characteristic
features, it is possible to classify DM? - gt Training models for the classification task
- Is it possible to computationally select a set of
DM characteristic features? - gt Exploring feature selection methods
49EXP1 Feature effects on detecting
decision-related DAs
- Combining ALL features outperforms baseline
(F-testplt0.001).
50Data Topic label annotation
Empirical Analysis
- Topic segmentation Segment each meeting into a
number of locally coherent segments - Topic labelling Label each segment with labels
from a standard set
51EXP2 Feature Selection
- Selecting the features that have occurred
significantly more often in decision-making
dialogue acts than expected by chance? - Lexical discriminability
- The association strength between the occurrence
of a given N-gram and that of DM dialogue acts. - Measures
- Log Likelihood (LL)
- Chi-Squared (X2)
- DICE coeffcient
- Point-wise Mutual Information
Step 0
Feature Selection
Selecting DM characteristic features
Step 1
Detecting DM Subdialogue
52EXP2 Feature Selection for Classifying DM
dialogue acts
- Compare models using
- the most discriminative (Q1), the mildly
discriminative (Q2), the mildly indiscriminative
(Q3), the least discriminative (Q4) - LL and DICE work well.
53Questions to answer
- (1) What are the features that can be used to
characterize DM dialogue acts? - gt Empirical analysis shows there exist a
demonstrable difference in features - (2) Given all the potentially characteristic
features, it is possible to classify DM dialogue
acts? - If not, possible to computationally select a set
of DM characteristic features?
54Experiment
- Data Preparation (Annotation)
- EXP1 Detecting which dialogue acts and segments
contain DM points - EXP2 Computationally selecting characteristic
features - EXP3 Hypothesizing topic labels on DM segments
55EXP3 Topic Labelling
- Casting the task of topic labelling as a text
classification task - Train language models for each of the topic
classes - Convert the multi-class classification task to
multiple binary classification tasks
DM segment
agenda
chitchat
Yes/no
Yes/no
56EXP3 Exploiting lexical discriminability
- Using LL to select topic discriminative N-grams
can improve the classification accuracy by 13.3.
57Automatic Decision Detection
Step 0
- DM dialogue act detection
- DM segment detection
- DM segment labelling
Feature Selection
Identifying DM characteristic features
Step 1
Step 2
Step 3
Detecting DM Dialogue act
Detecting DM Segment
Segment Topic Labelling
Identifying potential DM points
Classifying segment topics
58Known issue one decision, many DM dialogue acts
Meeting
ES2008d
D
Abstractive Summary
(1) The remote will resemble the potato
prototype. (2) There will be no feature to help
find the remote when it is misplaced (3) The
corporate logo will be on the remote. (4) The
buttons will all be one color. ......
59In the topic segment of how to find when
misplaced
- A but um the feature that we considered for it
not getting lost . - B Right . Well
- B were talking about that a little bit
- B when we got that email
- B and we think that each of these are so
distinctive , that it it's not just like another
piece of technology around your house . - B It's gonna be somewhere that it can be seen .
- A Mm-hmm .
- B So we're we're not thinking that it's gonna be
as critical to have the loss - D But if it's like under covers or like in a
couch you still can't see it . -
-
- A Okay , that's a fair evaluation .
- A Getting lost .
- A Um we so we do we've decided not to worry
about that for now . - A Okay
TOPIC
Requirement
DM point
60Proposed study Building Schema-based
representation
- Schema analysis identifying schematic elements
of DM dialogue act - Topic what the decision is about
- Requirement supporting arguments
- DM point specifying agreement or disagreement
61Future Work (A) Schema-based decision detection
- Annotation
- Schematic elements in DM conversations
- Topic, requirement, DM point
- Building models for detecting each schematic
element - Combining the predictions of schematic elements
to form a schema-based decision representation
(Purver et al., 2006)
62Future Work (B) Using automated generated
features
- ASR Output
- Automatic hypothesized topic labels
- Dialogue act classification
- Back-channel, stalls, fragments
- Dialogue act segmentation
- Extractive summaries
63Future Work (C) Task-based Evaluation
- Task-based Evaluation Test (TBET) (Post et al.,
2006) - Pre-questionnaire
- Introduction
- Browser walk-through
- Individual work (Role-specific preparation)
- Questionnaire (Understanding of previous
meetings) - Team work (Performing the current meeting)
- Questionnaire (Teamwork evaluation)
64Evaluation browser with DM detection component
v.s. browser without
65Future Work (D) Decision detection in new context
- Motivation
- Training data are expensive to obtain
- Online decision detection
- Determine the degree of domain dependency
inherent in the decision detection task - Scenario v.s. Non-scenario meetings in the AMI
corpus - How to deal with domain dependency
- (1) Using domain-independent features
- of Topical words, subjective words (Wilson
2005), agreement markers (Cohen 2002) - POS tags of the first and the last phrase
(Weiquns chunker)
66Future Work (D) Decision detection in new context
- How to deal with domain dependency
- (2) Unsupervised lexical approach
- Decision orientation (sentiment analysis)(Turney
2002) - Topical dynamics
- (3) Machine learning strategies
- Train on a limited amount of training data
- Automatically labelling in-domain data
- Meta-learning (ensemble classifiers)
67Estimating decision orientation (DO)
Future Work (C)
- Following Turney (2002)
- HYPOTHESIS Decision-oriented terms often
co-occur. - gt Estimate the occurrence strength of words
from data. - Questions
- How to choose a small set of decision-oriented
seed terms. - How to measure association strength of each word
context with the set of decision-oriented terms
on web?.
Decision Lexicon
Decision orientation
Context representation
Test conversation unit
68To summarize (Expected contribution)
- Establishing a framework for tackling the new
problem of SLU in conversational speech - Provided empirical evidences
- Developed models
Step 3
Labelling DM segment
Classifying segment topics
69Automatic decision detection DM dialogue act
detection DM segment detection Topic
labeling
DECISION
- Evaluation how to find when misplaced
- Um we've decided not to worry about that for now
70Ultimate Goal Automatically extract information
for summarisation and question answering in
conversational speech
71(No Transcript)
72Thank you!
- Questions?
- p.hsuh_at_ed.ac.uk
- This work is supported by the AMI project.
http//www.amiproject.org - My special thanks to Johanna, Steve, Jean,
Natasa, Stephan, Rieks, Terresa Wilson,
Heriberto, Zhang, Ivan, David, Weiqun, Gabriel,
and other partners in TNO, DFKI, and Univ. of
Twente for valuable comments and helps.
73Casting the DM detection task as a binary
classification task
- Apply the supervised classification framework
Training set
DM conversation unit (YES)
Test set
DM Detection Model
Feature Extraction
This is a decision-making point
Non DM conversation unit (NO)
74Future Work (3) More features
- Estimated decision orientation
- Estimated topic dynamics
- Lexical related features
- of Topical words
- of subjective words (Wilson 2005)
- of agreement markers (Cohen 2002)
- POS tags of the first and the last phrase
(Weiquns chunker) - Video features
- Per-speaker -per-motion zone estimation (e.g.,
chairman strong motion)
75Data Annotation Procedure
Empirical Analysis
- (1) Human annotators browse through a meeting
record. - (2) Then they are asked to produce abstractive
summary - Decision summary what are the decisions made in
the meeting? - (3) Another set of annotators are asked to
produce extractive summary and specify summary
links between extractive and abstractive summary
one by one. - Extract a subset of the dialogue acts of this
meeting. - Go through the extracted dialogue acts and link
with sentences in the abstractive summaries. Not
obligatory to have a link.
76EXP2 Feature Selection
Step 0
- Selecting the features that have occurred
significantly more often in decision-making
dialogue acts than expected by chance? - Compute the expected occurrence if the occurrence
of the ngram were independent of DM dialogue
acts. - E ( adv. chip, IN DM)
20 (3,824 / 106,180) 0.72
Feature Selection
Selecting DM characteristic features
Step 1
Detecting DM Subdialogue
77EXP2 Feature Selection for Classifying DM
dialogue acts
- 47 meetings, 5-fold cross validation
- Lexical discriminability
- The association strength between the occurrence
of a given N-gram and that of DM dialogue acts. - Train the model using a same number of the most
discriminative (Q1), the mildly discriminative
(Q2), the mildly discriminative (Q3), and the
least discriminative (Q4) features, selected with
different lexical discriminability measures.
78Future Work (A) Schema-based decision detection
- Schema analysis (from multicoder data)
- Analyzing the common schematic elements of DM
dialogue act - Analyzing the type of disagreement between
annotations from different annotators - Analyzing the DM dialogue act recognition drift
of annotations from same annotaors - Characteristic feature identification
79(No Transcript)
80Questions remaining
- How to choose the set of decision-oriented terms?
- Use all the unigrams that have occurred in the
decision-making dialogue acts. - Use LogLikelihood ratio to select the most
discriminative 25 of uni-grams that have
occurred in the decision-making dialogue acts. - Using extraction pattern bootstrapping strategy?
(Meta-Boostrapping and Basilisk) - What corpus OTHER THAN WEB can be used for
estimating the association strength with the set
of seed terms? - Meeting transcripts from the web?
- Any alternative?
81(2.2) Estimating topic dynamics
Future Work (2)
- Define topical dynamics as the distance to
current topic model. - Evaluate the new data against the current model.
- If the probability is low, the dynamics of
changing topics is high. - Adapt the models parameters given the new data.
Topic model
Topic dynamics
Context representation
Test conversation unit
82(2.2) Estimating topical dynamics
Future Work (2)
- Detect statistical outliers along the dimension
of decision orientation. - By finding isolated peaks of decision
orientation. - By measuring the degree to which the N-grams in
the test unit do not fit the topic model of
previously received words. - That is, the probability of the test data given
the original model.
Local Decision Orientation
83(2.3) Adapting to new context
Future Work (2)
- Machine learning strategies
- Re-use the out-of-domain labelled data
- Use the automatically labelled in-domain data
- Meta-learning ensemble classifiers
84Related Work Detecting opinionated speech in
conversations
- Detect hot spots (Wrede and Shriberg, 2003)
- Where the level of affect is high in the voices
- Detect group-level interest (Gatica-Perez, 2005)
- The degree of engagement displayed by the group
both in the voices and in the actions taken - Detect agreement/disagreement (Galley, 2004
Hillard 2003) - Whether major decisions are reached or not
- Detect action items (Purver et al., 06)
- What tasks are assigned to whom
85Related Work Detecting opinionated speech in
conversations
86(2.1) Estimating decision orientation (DO)
Future Work (2)
- Computationally characterize the potential
decision-making orientation - Compute the association strength of a subdialogue
with a set of decision-oriented terms - E.g., Words that are more closely associated with
the term decide are more likely to be used in
decision-making conversation. - The more decisionoriented terms in a
subdialogue, the more likely for it to be a DM
subdialogue.
87EXP1 Preliminary Results
- Settings
- Round 1 (25 meetings)
- Round 2 (50 meetings)
- Evaluation metrics Accuracy (F1
score/Prec/Recall) - Exact match if the hypothesized and referenced
spurt are the same. - Lenient match if the hypothesized spurt is not
more than N units away from any referenced spurt.
(N10 seconds) (Can be changed to 1 DACT later.)
88Data preparation
- Segment a whole sequence of utterances into
minimal units - Dialogue act human annotations.
- Spurt consecutive speech without pauses
- greater than 0.5 seconds in-between.
- Label each unit as decision-making (YES) or
non-decision making (NO). - Decision-making dialogue act extracted dialogue
acts that have been annotated as linked to the
abstractive decisions. - Decision-making spurt spurts that overlap with
the decision-linked dialogue acts.
89Train and test the model
- Choose a classifier Use J48, MaxEnt, SVM
- Choose MaxEnt as it makes efficient and stable
predictions.. - Train the model using different lexical features
- Train/Test
- Round 1 25-fold leave-one-out cross validation
- Round 2 50-fold leave-one-out cross validation
90EXP 1 Construct context representation
- As a vector with M dimensions, each of which
indicates the frequency of a particular lexical
feature (n-gram) in the spurt - E.g., An example DACT in meeting IS1008c
- and then i uh think we should go with the solar
cells as well as the um microphone and speaker on
the advanced chip - First order representation of bi-grams
91Experiment 1 Detecting DM subdialogue
- 5-fold cross validation
- Train on 80 of the data test on 20
- Feature selection methods
92Motivation
- Applications
- Meeting browser (providing the right level of
detail for the users to interpret what has
transpired) - Specialized summarization
- Information extraction for question answering
- Group information management (GIM)/Computer
supported collaborative works (CSCW)
93EXP1 Construct context representation
- As a vector with M dimensions, each of which
indicates the frequency of a particular lexical
feature (n-gram) in the spurt - E.g., An example DACT in meeting IS1008c
- and then i uh think we should go with the solar
cells as well as the um microphone and speaker on
the advanced chip - First order representation of bi-grams
94To answer question (1)
- Conduct an exploratory study to empirically
analyze the correspondence of a variety of
features with DM subdialogues - Does the distribution of feature values in DM
subdialogues exhibit a demonstrable difference
from that in general discussions?
Identify potential features
Empirical Analysis
Forming Hypothesis
95To answer question (2)
- Apply the supervised classification framework
previously proposed to detect other types of
opinionated speech to the problem of detecting DM
subdialogues
Develop models
Identify potential features
Exploratory Study
Forming Hypothesis
Testing Hypothesis
96To answer question (2) Selecting a subset of
discriminative features
- Experiment with feature selection methods
- Log likelihood measures (LL), x2 statistics(X2)
- Point-wise mutual information (PMI), DICE
coefficient (DICE)
Build classification models
Identify potential features
Exploratory Study
Forming Hypothesis
Testing Hypothesis
97EXP3 Quick Summary
- Selecting the N-gram features that have high
lexical discriminability improves the performance
of models on topic labelling. - LL, DICE gt X2 gtgt PMI
- The language modelling approach achieves at best
78.6 of precision on the topic labelling task. - Especially good for detecting FUNCTIONAL
segments, topic classes that are least likely to
contain DM points.