Agust - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Agust

Description:

Title: Diapositiva 1 Last modified by: AG Document presentation format: Presentaci n en pantalla Other titles: Arial Arial Unicode MS Times New Roman Wingdings ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 20
Provided by: csColumbi7
Category:

less

Transcript and Presenter's Notes

Title: Agust


1
Turn-Yielding Cuesin Task-Oriented Dialogue
  • Agustín Gravano1,2
  • Julia Hirschberg1
  • Columbia University, New York, USA
  • (2) Universidad de Buenos Aires, Argentina

2
Interactive Voice Response Systems
Introduction
  • Quickly spreading.
  • Uncomfortable, awkward.
  • ASRTTS account for most IVR problems.
  • Other problems revealed.
  • Coordination of system-user exchanges.
  • Long pauses after user turns interruptions.
  • Modeling turn-taking behavior should lead to
    improved system-user coordination.

3
Goal
Introduction
  • Learn when the speaker is likely to end her/his
    conversational turn.
  • Find turn-yielding cues.
  • Cues displayed by the speaker when approaching a
    potential turn boundary.
  • This should improve the coordination of IVRs
  • Speech understanding Detect the end of the
    users turn.
  • Speech generation Display cues signalling the
    end of systems turn.

4
Talk Outline
  • Previous work
  • Material
  • Method
  • Results
  • Conclusions

5
Previous Work on Turn-Taking
  • Duncan 1972, 1973, 1974, inter alia.
  • Hypothesized 6 turn-yielding cues in face-to-face
    dialogue.
  • Conjectured a linear relation between the number
    of displayed cues and the likelihood of a
    turn-taking attempt.
  • Studies formalized and verified some of Duncans
    hypotheses. ForTho96 WenSie03 CutPea86
    WicCas01
  • Implementations of turn-boundary detection.
  • Simulations Feral.02,03 Edlal.05 Sch06
    Attal.08 Bau08
  • Actual systems Lets Go! RauEsk08
  • Exploiting turn-yielding cues improves
    performance.

6
Columbia Games Corpus
Material
  • 12 task-oriented spontaneous dialogues.
  • Standard American English.
  • 13 subjects 6 female, 7 male.
  • Series of collaborative computer games.
  • No eye contact. No speech restrictions.
  • 9 hours of dialogue.
  • Manual orthographic transcription, alignment.
  • Manual prosodic annotations (ToBI).

7
Columbia Games Corpus
Material
Player 1 Describer
Player 2 Follower
8
Turn-Yielding Cues
  • Cues displayed by the speaker when approaching a
    potential turn boundary.

9
Method
Turn-Yielding Cues
  • IPU (Inter Pausal Unit) Maximal sequence of
    words from the same speaker surrounded by silence
    50ms.
  • Smooth switch Speaker A finishes her utterance
    speaker B takes the turn with no overlapping
    speech.
  • Trained annotators distinguished Smooth switches
    from Interruptions and Backchannels using a
    scheme based on Ferguson 1977, Beattie 1982.

10
Method
Turn-Yielding Cues
  • To find turn-yielding cues, we compare
  • IPUs preceding Holds,
  • IPUs preceding Smooth switches.
  • 200 features acoustic, prosodic, lexical,
    syntactic.

11
Individual Cues
Turn-Yielding Cues
  • Final intonation
  • Falling (L-L) or high-rising (H-H).
  • Faster speaking rate.
  • Reduction of final lengthening.
  • Lower intensity level.
  • Lower pitch level.
  • Higher jitter, shimmer, NHR.
  • Related to perception of voice quality.
  • Longer IPU duration (seconds and words).

12
Individual Cues
Turn-Yielding Cues
  • Textual completion (independent of intonation).
  • (1) Manually annotated a portion of the data.
  • Labelers read up to the end of a target IPU (no
    right context), judged whether it could
    constitute a complete utterance. 400 tokens.
    K0.81.
  • (2) Trained an SVM classifier.19 lexical
    syntactic features.Accuracy 80. Maj-class
    baseline 55. Human agreement 91.
  • (3) Labeled all IPUs in the corpus with the SVM
    model.

13
Individual Cues
Turn-Yielding Cues
  • Final intonation L-L or H-H.
  • Faster speaking rate.
  • Lower intensity level.
  • Lower pitch level.
  • Higher jitter, shimmer, NHR.
  • Longer IPU duration.
  • Textual completion.

14
Defining Presence of a Cue
Turn-Yielding Cues
  • 2-3 representative features for each cue

Final intonation Abs. pitch slope over final 200ms, 300ms.
Speaking rate Syllables/sec, phonemes/sec over IPU.
Intensity level Mean intensity over final 500ms, 1000ms.
Pitch level Mean pitch over final 500ms, 1000ms.
Voice quality Jitter, shimmer, NHR over final 500ms.
IPU duration Duration in ms, and in number of words.
Textual completion Complete vs. incomplete (binary).
  • Define presence/absence based on whether the
    value is closer to the mean before S or H.

15
Top Frequencies of Complex Cues
digit cue present dot cue absent
Turn-yielding cues 1 Final intonation 2
Speaking rate 3 Intensity level 4 Pitch
level 5 IPU duration 6 Voice quality 7
Completion
16
Combined Cues
Turn-Yielding Cues
r 2 0.969
Percentage of turn-taking attempts
Number of cues conjointly displayed
17
IVR Systems
Turn-Yielding Cues
  • After each IPU from the user
  • if estimated likelihood gt threshold
  • then take the turn
  • To signal the end of a systems turn
  • Include as many cues as possible in the systems
    final IPU.

18
Summary
  • Study of turn-yielding cues.
  • Objective, automatically computable.
  • Combined cues.
  • Improve turn-taking decisions of IVR systems.
  • Results drawn from task-oriented dialogues.
  • Not necessarily generalizable.
  • Suitable for most IVR domains.
  • Interspeech 2009 Study of backchannel-inviting
    cues.

19
Special thanks to
  • Julia Hirschberg
  • Thesis Committee Members
  • Maxine Eskenazi, Kathy McKeown, Becky Passonneau,
    Amanda Stent.
  • Speech Lab at Columbia University
  • Stefan Benus, Fadi Biadsy, Sasha Caskey, Bob
    Coyne, Frank Enos, Martin Jansche, Jackson
    Liscombe, Sameer Maskey, Andrew Rosenberg.
  • Collaborators
  • Gregory Ward and Elisa Sneed German (Northwestern
    U) Ani Nenkova (UPenn) Héctor Chávez, David
    Elson, Michel Galley, Enrique Henestroza, Hanae
    Koiso, Shira Mitchell, Michael Mulley, Kristen
    Parton, Ilia Vovsha, Lauren Wilcox.
Write a Comment
User Comments (0)
About PowerShow.com