Dialogue Structure and Pronoun Resolution - PowerPoint PPT Presentation

About This Presentation

Title:

Dialogue Structure and Pronoun Resolution

Description:

Dialogue Structure and Pronoun Resolution Joel Tetreault and James Allen University of Rochester Department of Computer Science DAARC September 23, 2004 – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 28

Provided by: csRoches

Learn more at: https://www.cs.rochester.edu

Category:

more less

Transcript and Presenter's Notes

Title: Dialogue Structure and Pronoun Resolution

1
Dialogue Structure and Pronoun Resolution

Joel Tetreault and James Allen
University of Rochester
Department of Computer Science
DAARC
September 23, 2004

2
WELCOME TO DAARC!!!
3
Reference in Spoken Dialogue

Resolving anaphoric expressions correctly is
critical in task-oriented domains
Makes conversation easier for humans
Reference resolution module provides feedback to
other components in system
Ie. Incremental Parsing, Interpretation Module
Investigate how to improve RRM
Discourse Structure could be effective in
reducing search space of antecedents and
improving accuracy (Grosz and Sidner, 1986)
Paucity of empirical work Byron and Stent
(1998), Eckert and Strube (2001), Byron (2002)

4
Goal

To evaluate whether shallow approaches to
dialogue structure can improve a reference
resolution algorithm (LRC used as baseline model
to augment)
Investigated two models
Eckert Strube (manual and automatic versions)
Literal QUD model (manual)

5
Outline

Background
Dialogue Act synchronization (Eckert and Strube
model)
QUD (Craige Roberts)
Monroe Corpus
Algorithm
Results
3rd person pronoun evaluation
Dialogue Structure
Summary

6
Past approaches in structure and reference

Veins the nuclei of RST trees are the most
salient discourse units, the entities in these
units are this more salient than others
Tetreault (2003) Penn Treebank subset annotated
with RST. Used GS approximations to try to
improve on LRC baseline.
Result performed the same as baseline
Veins decreased performance slightly
Problem fine-grained approaches (RST) are
difficult to annotate reliably and do in
real-time.
Perhaps shallow approaches can work?

7
literal QUD

Questions Under Discussion (Craige Roberts,
Jonathan Ginzburg) what are we talking
about? topics create discourse segments
Literally questions or modals can be viewed as
creating a discourse segment
Result questions provide a shallow discourse
structuring, and that maybe enough to improve
performance, especially in a task-oriented domain
Entities in QUD main segment can be viewed as the
topic
Segment closed when question is answered (use ack
sequences, change in entities used)
only entities from answer and entities in
question are accessible
Can be used in TRIPS to reduce search space of
entities set context size

8
QUD Annotation Scheme

Annotate
Start utterance
End utterance
Type (aside, repeated question, unanswered,
open-ended, clarification)
Kappa (compared with reconciled data)

9
Example - QUD
utt06 U Where is it? utt07 U Just a
second utt08 U I can't find the Rochester
airport utt09 S It's ---------------------------
----------------------------- utt10 U I think I
have a disability with maps utt11 U Have I ever
told you that before utt12 S It's located on
brooks avenue utt13 U Oh thank you utt14 S Do
you see it? utt15 U Yes
(QUD-entry start utt06 end utt13
type clarification) (QUD-entry start utt10
end utt11 type aside)
10
Example - QUD (utt10-11 processed)
utt06 U Where is it? utt07 U Just a
second utt08 U I can't find the Rochester
airport utt09 S It's utt10,11
removed -----------------------------------------
--------------- utt12 S It's located on brooks
avenue utt13 U Oh thank you utt14 S Do you
see it? utt15 U Yes
(QUD-entry start utt06 end utt13
type clarification) (QUD-entry start utt10
end utt11 type aside)
11
Example - QUD (s13 processed)
utt06-13 collapsed the Rochester airport,
brooks avenue ----------------------------------
---------------------- utt14 S Do you see
it? utt15 U Yes
(QUD-entry start utt06 end utt13
type clarification)
12
QUD Issues

Issue 1 easy to detect Qs (use Speech-Act
information), but how do you know Q is answered?
Cue words, multiple acknowledgements, changes in
entities discussed provide strong clues that
question is finishing, but general questions such
as how are we going to do this? can be
ambiguous
Issue 2 what is more salient to a QUD pronoun
the QUD topic or a more recent entity?

13
Dialogue Act Segmentation

ES model to resolve all types of pronouns (3rd
person and abstract) in spoken dialogue
Intuition grounding is very important in spoken
dialogue
Utterances that are not acknowledged by the
listener may not be in common ground and thus not
accessible to pronominal reference

14
Dialogue Act Segmentation

Each utterance marked as
(I) contains content (initiation), question
(A) acknowledgment
(C) combination of the above
(N) none of the above
Basic algorithm utterances not ackd or not in a
string of Is are removed from the discourse
before next sentence is processed
Evaluation showed improvement for pronouns
referring to abstract entities, and strong
annotator reliability
Pronoun performance? Unclear, no comparison of
measure without using DA model

15
Example DA model
utt06 U Where is it? utt07 U Just a
second utt08 U I can't find the Rochester
airport utt09 S It's utt10 U I think I have a
disability with maps (removed) utt11 U Have I
ever told you that before utt12 S It's located
on brooks avenue utt13 U Oh thank you utt14 S
Do you see it? utt15 U Yes
(I) (N) (I) (N) (I) (I) (I) (A) (I) (A)
16
Parsing Monroe Domain

Domain Monroe Corpus of 20 transcriptions
(Stent, 2001) of human subjects collaborating on
Emergency Rescue 911 tasks
Each dialogue was at least 10 minutes long, and
most were over 300 utterances long
Work presented here focuses on 5 of the dialogues
(1756 utterances) (278 3rd person pronouns)
Goals develop a corpus of sentences parsed with
rich syntactic, semantic, discourse information
to
Able to parse 5 dialogue sub-corpus with 84
accuracy
More details see ACL Discourse Annotation 04

17
TRIPS Parser

Broad-coverage, deep parser
Uses bottom-up algorithm with CFG and domain
independent ontology combined with a domain model
Flat, unscoped LF with events and labeled
semantic roles based on FrameNet
Semantic information for noun phrases based on
EuroWordNet

18
Parser information for Reference

Rich parser output is helpful for discourse
annotation and reference resolution
Referring expressions identified (pronoun, NP,
impros)
Verb roles and temporal information (tense,
aspect) identified
Noun phrases have semantic information associated
with them
Speech act information (question, acknowledgment)
Discourse markers (so, but)
Semi-automatic annotation increases reliability

19
Semantics Example an ambulance

(TERM VAR V213818
LF (A V213818 ( LFLAND-VEHICLE
WAMBULANCE)
INPUT (AN AMBULANCE))
SEM ( FPHYS-OBJ
(SPATIAL-ABSTRACTION SPATIAL-POINT)
(GROUP -)
(MOBILITY LAND-MOVABLE)
(FORM ENCLOSURE)
(ORIGIN ARTIFACT)
(OBJECT-FUNCTION VEHICLE)
(INTENTIONAL -)
(INFORMATION -)
(CONTAINER (OR -))
(TRAJECTORY -)))

20
Reference Annotation

Annotated dialogues for reference w/undergraduate
researchers (created a Java Tool PronounTool)
Markables determined by LF terms
Identification numbers determined by VAR field
of LF term
Used stand-off file to encode what each pronoun
refers to (refers-to) and the relation between
pronoun and antecedent (relation)
Post-processing phase assigns an unique
identification number to coreference chains
Also annotated coreference between definite noun
phrases

21
Reference Annotation

Used slightly modified MATE scheme pronouns
divided into the following types
IDENTITY (Coreference) (278)
Includes set constructions (6)
FUNCTIONAL (20)
PROPOSITON/D.DEXEIS (41)
ACTION/EVENT (22)
INDEXICAL (417)
EXPLETIVE (97)
DIFFICULT (5)

22
LRC Algorithm

LRC modified centering algorithm (Tetreault 01)
that does not use Cb or transitions, but keeps a
Cf-list (history) for each utterance
While processing utterances entities (left to
right) do
Push entity onto Cf-list-new, for a pronoun p,
attempt to resolve
Search through Cf-list-new (l-to-r) taking the
first candidate that meets gender, agreement, and
binding and semantic feature constraints.
If none found, search past utterances Cf-lists
starting from previous utterance to beginning of
discourse
When p is resolved, push pronoun with semantic
features from antecedent on to Cf-list-new
More details see SemDial 04

23
LRC Algorithm with Structure Info

Augmented algorithm with extensions to handle QUD
and ES input
For QUD, at the start and end of processing an
utterance, QUDs are started (pushed on stack) or
ended (entities are collapsed), so Cf-list
history changes
For ES, each utterance is assigned a DA code and
then removed or kept depending on the next
utterance (if it is an acknowledgement, or a
series of Is)

24
Results
25
Error Analysis

Though QUD and sem baseline performed the same
(89 errors), they each got 3 pronouns right the
other did not
Baseline
3 collapsing nodes removes correct antecedent
QUD
2 right associated with blocking off aside
1 associated with collapsing (intervening nodes
blocked)
15 pronouns, both got wrong, but made different
predictions
Remaining 71, both made same error

26
Issues

Structuring methods are probably more trouble
than they are worth with the corpora available
right now
Also only affect a few pronouns
Segment ends are least reliable
What constitutes an end?
3 errors show either boundaries are marked
incorrectly if pronouns are accessing elements in
a closed DS
Or perhaps collapsing routine is too harsh
Small corpus size
Hard to draw definite conclusions given only 3
criss-crossed errors
need more data for statistical evaluations

27
Issues

ES Model has advantage over QUD of being easiest
to automate, but fares worse since it takes into
account a small window of utterances (extremely
shallow)
QUD model can be semi-automated (detecting
question starts is easy) but detecting ends and
type are harder
QUD could definitely be improved by taking into
account plan initiations and suggestions, instead
of limiting to questions only, but tradeoff is
reliability