Discourse Annotation for Improving Spoken Dialogue Systems - PowerPoint PPT Presentation

About This Presentation

Title:

Discourse Annotation for Improving Spoken Dialogue Systems

Description:

Title: PowerPoint Presentation Last modified by: Joel Tetreault Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:102

Avg rating:3.0/5.0

Slides: 23

Provided by: csRoches6

Learn more at: https://www.cs.rochester.edu

Category:

more less

Transcript and Presenter's Notes

Title: Discourse Annotation for Improving Spoken Dialogue Systems

1
Discourse Annotation for Improving Spoken
Dialogue Systems

Joel Tetreault, Mary Swift, Preethum Prithviraj,
Myroslava Dzikovska, James Allen
University of Rochester
Department of Computer Science
ACL Workshop on Discourse Annotation
July 25, 2004

2
Reference in Spoken Dialogue

Resolving anaphoric expressions correctly is
critical in task-oriented domains
Makes conversation easier for humans
Reference resolution module provides feedback to
other components in system
Ie. Incremental Parsing, Interpretation Module
Investigate how to improve RRM
Does deep semantic information provide an
improvement over syntactic approaches?
Discourse Structure could be effective in
reducing search space of antecedents and
improving accuracy (Grosz and Sidner, 1986)

3
Goal

Construct a linguistically rich parsed corpus to
test algorithms and theories on reference in
spoken dialogue, to provide overall system
improvement
Implicit roles
Paucity of empirical work on reference in spoken
dialogue (Bryon and Stent 1998, Eckert Strube,
2000 etc.)

4
Outline

Corpus Construction
Parsing Monroe Domain
Reference Annotation
Dialogue Structure Annotation
Results
Personal pronoun evaluation
Dialogue Structure
Summary

5
Parsing Monroe Domain

Domain Monroe Corpus of 20 transcriptions
(Stent, 2001) of human subjects collaborating on
Emergency Rescue 911 tasks
Each dialogue was at least 10 minutes long, and
most were over 300 utterances long
Work presented here focuses on 5 of the dialogues
17 (1756 utterances)
Goals develop a corpus of sentences parsed with
rich syntactic, semantic, discourse information
to
Improve TRIPS parser (Swift et al., 2004)
Train statistical parser for comparison with
existing parser
Develop incremental parser (Stoness et al., 2004)
Develop automated techniques for marking repairs

6
Parser information for Reference

Rich parser output is helpful for discourse
annotation and reference resolution
Referring expressions identified (pronoun, NP,
impros)
Verb roles and temporal information (tense,
aspect) identified
Noun phrases have semantic information associated
with them
Speech act information (question, acknowledgment)
Discourse markers (so, but)
Semi-automatic annotation increases reliability

7
Monroe Corpus Example

UTT SPKSA TEXT
Utt53 S TELL and so we're going to
take an ambulance
from saint mary's hospital
Utt54 U TELL oh you never told me
about the ambulances
Utt55 U WH-QU how many do you have
Utt56 S TELL there's one at saint
mary's hospital and two at rochester
general hospital
Utt57 U IDENTIFY two
Utt58 U CONFIRM okay
Utt59 S TELL and we're going to take
an ambulance from saint mary's to east
main street
Utt60 S CCA and that is as far as i
have planned
Utt61 U CONFIRM okay
Utt62A U CONFIRM okay

8
TRIPS Parser

Broad-coverage, deep parser
Uses bottom-up algorithm with CFG and domain
independent ontology combined with a domain model
Flat unscoped LF with events and labeled semantic
roles based on FrameNet
Semantic information for noun phrases based on
EuroWordNet

9
Semantics Example an ambulance

(TERM VAR V213818
LF (A V213818 ( LFLAND-VEHICLE
WAMBULANCE)
INPUT (AN AMBULANCE))
SEM ( FPHYS-OBJ
(SPATIAL-ABSTRACTION SPATIAL-POINT)
(GROUP -)
(MOBILITY LAND-MOVABLE)
(FORM ENCLOSURE)
(ORIGIN ARTIFACT)
(OBJECT-FUNCTION VEHICLE)
(INTENTIONAL -)
(INFORMATION -)
(CONTAINER (OR -))
(TRAJECTORY -)))

10
Semantic Representations for Them

and then send them to Strong Hospital
(TERM VAR V3337536
LF (PRO V3337536
(SET-OF ( LFREFERENTIAL-SEM THEM))
SEM ( FPHYS-OBJ (FMOBILITY
FMOVABLE)))

11
Corpus Construction

Mark sentence status (ungrammatical, incomplete,
conjoined) and mark speech repairs
Parse with domain-specific semantic restrictions
for better coverage
Handcheck sentences, marking GOOD or BAD
Criteria for GOOD both syntactic and semantic
must be correct
Update parser to cover BAD cases
Reparse and repeat handchecking

Data Collection
Corpus Annotation
Run Parser
Manual Update
Parser Update
Reparse Merge
12
Current Coverage
Corpus Good Good Bad NA Total
S2 90.8 325 34 37 405
S4 76.1 246 78 61 388
S12 89.9 151 17 21 189
S16 84.2 298 56 29 383
S17 85.2 311 54 26 392
Overall 84.1 1331 239 174 1757
13
Reference Annotation

Annotated dialogues for reference w/undergraduate
researchers (created a Java Tool PronounTool)
Markables determined by LF terms
Identification numbers determined by VAR field
of LF term
Used stand-off file to encode what each pronoun
refers to (refers-to) and the relation between
pronoun and antecedent (relation)
Post-processing phase assigns an unique
identification number to coreference chains
Also annotated coreference between definite noun
phrases

14
Reference Annotation

Used slightly modified MATE scheme pronouns
divided into the following types
IDENTITY (Coreference) (278)
FUNCTIONAL (20)
PROPOSITON/D.DEXEIS (41)
ACTION/EVENT (22)
INDEXICAL (417)
EXPLETIVE (97)
DIFFICULT (5)

15
Dialogue Structure

How to integrate discourse structure into a
reference module? Is it worth it?
Shallow techniques may work better may not be
necessary to get a fine embedding to improve
reference resolution
Implemented QUD-based technique and Dialogue Act
model (Eckert and Strube, 2000)
Annotated in a stand-off file

16
literal QUD

Questions Under Discussion (Craige Roberts,
Jonathan Ginzburg) questions or modals can be
viewed as creating a discourse segment
Result questions provide a shallow discourse
structuring, but that maybe enough to improve
performance
Entities in QUD main segment can be viewed as the
topic
Segment closed when question is answered (use ack
sequences, change in entities used)
only entities from answer and entities in
question are accessible
Can be used in TRIPS to reduce search space of
entities set context size

17
QUD Annotation Scheme

Annotate
Start utterance
End utterance
Type (aside, repeated question, unanswered, nil)

18
QUD

Issue 1 easy to detect Qs (use Speech-Act
information), but how do you know Q is answered?
Cue words, multiple acknowledgements, changes in
entities discussed provide strong clues that
question is finishing
Issue 2 what is more salient to a QUD pronoun
the QUD topic or a more recent entity?

19
Dialogue Act Segmentation

Utterances that are not acknowledged by the
listener may not be in common ground and thus not
accessible to pronominal reference
Evaluation showed improvement for pronouns
referring to abstract entities, and strong
annotator reliability
Each utterance marked as I contains content
(initiation), A acknowledgment, C combination
of the above

20
Results

Incorporating semantics into reference resolution
algorithm (LRC) improves performance from 61.5
to 66.9 (CATALOG 04)
Preliminary QUD results show an additional boost
to 67.3 (DAARC 04)
ES Automated 63.4
ES Manual 60.0

21
Issues

Inter-annotator agreement for QUD annotation
Segment ends are hardest to synch
Ungrammatical and fragmented utterances
Parse automatically or manually?
Small corpus size need more data for statistical
evaluations
Parser freeze? important for annotators to stay
abreast of latest changes

22
Summary

Semi-automated parsing process to produce
reliable discourse annotation
Discourse annotation done manually, but automated
data helps guide manual annotation
Result spoken dialogue corpus with rich
linguistic data

Write a Comment

User Comments (0)