Challenges in Dialogue - PowerPoint PPT Presentation

About This Presentation
Title:

Challenges in Dialogue

Description:

... common in human-human Even more common in human-computer dialogue Implicature & Grice s Maxims ... From Human to Computer Conversational agents ... – PowerPoint PPT presentation

Number of Views:124
Avg rating:3.0/5.0
Slides: 26
Provided by: classesCs8
Category:

less

Transcript and Presenter's Notes

Title: Challenges in Dialogue


1
Challenges in Dialogue
  • Discourse and Dialogue
  • CMSC 35900-1
  • October 27, 2006

2
Roadmap
  • Issues in Dialogue
  • Dialogue vs General Discourse
  • Dialogue Acts
  • Modeling
  • Recognition and Interpretation
  • Dialogue Management for Computational Agents

3
Dialogue vs General Discourse
  • Key contrast Two or more speakers
  • Primary focus on speech
  • Issues in multi-party spoken dialogue
  • Turn-taking who speaks next, when?
  • Collaboration clarification, feedback,
  • Disfluencies
  • Adjacency pairs, dialogue acts

4
Turn-Taking
  • Multi-party discourse
  • Need to trade off speaker/hearer roles
  • Interpret reference from sequential utterances
  • When?
  • End of sentence?
  • No multi-utterance turns
  • Silence?
  • No little silence in smooth dialoguelt 250ms
  • When other starts speaking?
  • No relatively little overlap face-to-face 5

5
Turn-taking When
  • Rule-governed behavior
  • Possibly multiple legal turn change times
  • Aka transition-relevance places (TRP)
  • Generally at utterance boundaries
  • Utterance not necessarily sentence
  • In fact, utterance/sentence boundaries not
    obvious in speech
  • Dont necessarily pause between sentences
  • Automatic utterance boundary detection
  • Cue words (okay, so,..) POS sequences prosody

6
Turn-taking Who How
  • At each TRP in each turn (Sacks 1974)
  • If speaker has selected A to speak, A must take
    floor
  • If speaker has selected no one to speak, anyone
    can
  • If no one else takes the turn, the speaker can
  • Selecting speaker A
  • By explicit/implicit mention What about it, Bob?
  • By gaze, function
  • Selecting others questions, greetings, closing
  • (Traum et al., 2003)

7
Turn-taking in HCI
  • Human turn end
  • Detected by 250ms silence
  • System turn end
  • Signaled by end of speech
  • Indicated by any human sound
  • Barge-in
  • Continued attention
  • No signal

8
Gesture, Gaze Voice
  • Range of gestural signals
  • head (nod,shake), shoulder, hand, leg, foot
    movements facial expressions postures
    artifacts
  • Align with syllables
  • Units phonemic clause change
  • Study with recorded exchanges

9
Yielding the Floor
  • Turn change signal
  • Offer floor to auditor/hearer
  • Cues pitch fall, lengthening, but uh, end
    gesture, amplitude dropuh, end clause
  • Likelihood of change increases with more cues
  • Negated by any gesticulation

10
Taking the Floor
  • Speaker-state signal
  • Indicate becoming speaker
  • Occurs at beginning of turns
  • Cues
  • Shift in head direction
  • AND/OR
  • Start of gesture

11
Retaining the Floor
  • Within-turn signal
  • Still speaker Look at hearer as end clause
  • Continuation signal
  • Still speaker Look away after within-turn/back
  • Back-channel
  • mmhm/okay/etc nods,
  • sentence completion. Clarification request
    restate
  • NOT a turn signal attention, agreement, confusion

12
Segmenting Turns
  • Speaker alone
  • Within-turn signal-gtend of one unit
  • Continuation signal -. Beginning of next unit
  • Joint signal
  • Speaker turn signal (end) auditor -gtspeaker
    speaker-gtauditor
  • Within-turn back-channel continuation
  • Back-channels signal understanding
  • Early back-channel continuation

13
Regaining Attention
  • Gaze Disfluency
  • Disfluency perturbation in speech
  • Silent pause, filled pause, restart
  • Gaze
  • Conversants dont stare at each other constantly
  • However, speaker expects to meet hearers gaze
  • Confirm hearers attention
  • Disfluency occurs when realize hearer NOT
    attending
  • Pause until begin gazing, or to request attention

14
Improving Human-Computer Turn-taking
  • Identifying cues to turn change and turn start
  • Meeting conversations
  • Recorded, natural research meetings
  • Multi-party
  • Overlapping speech
  • Units Spurts between 500ms silence
  • Can predict on-line likely turn end

15
Text Prosody
  • Text sequence
  • Modeled as n-gram language model
  • Implement as HMM
  • Prosody
  • Duration, Pitch, Pause, Energy
  • Decision trees classify probability
  • Integrate LM DT

16
Decision Trees
A
Xt
Xf
B
C
Ygt1
Ylt2
Ylt1
Ygt2
D
E
F
G
None
Sentence End
Sentence End
Disfluency
17
Interpreting Breaks
  • For each inter-word position
  • Is it a disfluency, sentence end, or
    continuation?
  • Key features
  • Pause duration, vowel duration
  • 62 accuracy wrt 50 chance baseline
  • 90 overall
  • Best combines LM DT

18
Jump-in Points
  • (Used) Possible turn changes
  • Points WITHIN spurt where new speaker starts
  • Key features
  • Pause duration, low energy, pitch fall
  • Accuracy 65 wrt 50 baseline
  • Performance depends only on preceding prosodic
    features

19
Jump-in Features
  • Do people speak differently when jump-in?
  • Differ from regular turn starts?
  • Examine only first words of turns
  • No LM
  • Key features
  • Raised pitch, raised amplitude
  • Accuracy 77 wrt 50 baseline
  • Prosody only

20
Collaborative Communication
  • Speaker tries to establish and add to common
    ground mutual belief
  • Presumed a joint, collaborative activity
  • Make sure mutually believe the same thing
  • Hearer can acknowledge/accept/disagree
  • Clark Schaeffer Degrees of grounding
  • Display, Demonstrate/Reformulate,
    Acknowledgement, Next relevant contribution,
    Continued attention

21
Computational Models
  • (Traum et al) revised for computation
  • Involves both speaker and hearer
  • Initiate, Continue, Acknowledge, Repair, Request
    Repair, etc
  • Common phenomena
  • Back-Channel uh-huh, okay, etc
  • Allows hearer to signal continued attention, ack
  • WITHOUT taking the turn
  • Requests for repair common in human-human
  • Even more common in human-computer dialogue

22
Implicature Grices Maxims
  • Inferences licensed by utterances
  • Grices Maxims
  • Quantity Be as informative as required
  • There are two classes per week not 1, or 5
  • Quality Be truthful dont lie,
  • Relevance Be relevant
  • Manner Be perspicuous
  • Dont be obscure, ambiguous, prolix, or
    disorderly
  • Flouting maxims Consciously violate for effect
  • Humor, emphasis,

23
Speech Dialogue Acts
  • Speech Acts (Austin, Searle)
  • Doing things with words
  • E.g. performatives I dub thee Sir Lancelot
  • Illocutionary acts act of asking, answering,
    promising, etc in saying an utterance
  • Include Assertives I propose to.. ,
    Directives Stop that, Commissives I
    promise, Expressives Thank you, Declarations
    Youre fired

24
Dialogue Acts
  • (aka Conversational moves)
  • Enriched set of speech acts
  • Capture full range of conversational functions
  • Adjacency pairs Many two-part structures
  • E.g. Question-Answer, Greeting-Greeting,
    Request-Grant, etc
  • Paired for speaker-hearer dyads
  • Contrast with rhetorical relations in monologue

25
DAMSL
  • Dialogue Act Tagging framework
  • Adjacency pairsgroundingrepair
  • Forward looking functions
  • Statement, info-request, commit, closing, etc
  • Backward looking functions
  • Focus on link to prior speaker utterance
  • Agreement, answer, accept, etc..

26
Tagged Dialogue
assert C1 . . . I need to travel in
May. inforeq,ack A1 And, what day in May did
you want to travel? assert,answer C2 OK uh I
need to be there for a meeting thats from the
12th to the 15th. inforeq,ack A2 And youre
flying into what city? assert,answerC3
Seattle. inforeq,ack A3 And what time would
you like to leave Pittsburgh? check,hold C4 Uh
hmm I dont think theres many options for
nonstop. accept,ack A4 Right. assert Theres
three non-stops today. info-req C5 What are
they? assert,open-option A5 The first one
departs PGH at 1000am arrives Seattle at 1205
their time. The second flight departs PGH at
555pm, arrives Seattle at 8pm. And the last
flight departs PGH at 815pm arrives Seattle at
1028pm. accept,ack C6 OK Ill take the 5ish
flight on the night before on the11th. check,ack
A6 On the 11th? assert,ack OK. Departing at
555pm arrives Seattle at 8pm, U.S. Air flight
115. ack C7 OK.
27
Dialogue Act Recognition
  • Goal Identify dialogue act tag(s) from surface
    form
  • Challenge Surface form can be ambiguous
  • Can you X? yes/no question, or info-request
  • Flying on the 11th, at what time? check,
    statement
  • Requires interpretation by hearer
  • Strategies Plan inference, cue recognition

28
Plan-inference-based
  • Classic AI (BDI) planning framework
  • Model Belief, Knowledge, Desire
  • Formal definition with predicate calculus
  • Axiomatization of plans and actions as well
  • STRIPS-style Preconditions, Effects, Body
  • Rules for plan inference
  • Elegant, but..
  • Labor-intensive rule, KB, heuristic development
  • Effectively AI-complete

29
Cue-based Interpretation
  • Employs sets of features to identify
  • Words and collocations Please -gt request
  • Prosody Rising pitch -gt yes/no question
  • Conversational structure prior act
  • Example Check
  • Syntax tag question ,right?
  • Syntax prosody Fragment with rise
  • N-gram argmax d P(d)P(Wd)
  • So you, sounds like, etc
  • Details later .

30
From Human to Computer
  • Conversational agents
  • Systems that (try to) participate in dialogues
  • Examples Directory assistance, travel info,
    weather, restaurant and navigation info
  • Issues
  • Limited understanding ASR errors, interpretation
  • Computational costs
  • broader coverage -gt slower, less accurate

31
Dialogue Manager Tradeoffs
  • Flexibility vs Simplicity/Predictability
  • System vs User vs Mixed Initiative
  • Order of dialogue interaction
  • Conversational naturalness vs Accuracy
  • Cost of model construction, generalization,
    learning, etc
  • Models FST, Frame-based, HMM, BDI
  • Evaluation frameworks
Write a Comment
User Comments (0)
About PowerShow.com