Title: Challenges in Dialogue
1Challenges in Dialogue
- Discourse and Dialogue
- CMSC 35900-1
- October 27, 2006
2Roadmap
- Issues in Dialogue
- Dialogue vs General Discourse
- Dialogue Acts
- Modeling
- Recognition and Interpretation
- Dialogue Management for Computational Agents
3Dialogue vs General Discourse
- Key contrast Two or more speakers
- Primary focus on speech
- Issues in multi-party spoken dialogue
- Turn-taking who speaks next, when?
- Collaboration clarification, feedback,
- Disfluencies
- Adjacency pairs, dialogue acts
4Turn-Taking
- Multi-party discourse
- Need to trade off speaker/hearer roles
- Interpret reference from sequential utterances
- When?
- End of sentence?
- No multi-utterance turns
- Silence?
- No little silence in smooth dialoguelt 250ms
- When other starts speaking?
- No relatively little overlap face-to-face 5
5Turn-taking When
- Rule-governed behavior
- Possibly multiple legal turn change times
- Aka transition-relevance places (TRP)
- Generally at utterance boundaries
- Utterance not necessarily sentence
- In fact, utterance/sentence boundaries not
obvious in speech - Dont necessarily pause between sentences
- Automatic utterance boundary detection
- Cue words (okay, so,..) POS sequences prosody
6Turn-taking Who How
- At each TRP in each turn (Sacks 1974)
- If speaker has selected A to speak, A must take
floor - If speaker has selected no one to speak, anyone
can - If no one else takes the turn, the speaker can
- Selecting speaker A
- By explicit/implicit mention What about it, Bob?
- By gaze, function
- Selecting others questions, greetings, closing
- (Traum et al., 2003)
7Turn-taking in HCI
- Human turn end
- Detected by 250ms silence
- System turn end
- Signaled by end of speech
- Indicated by any human sound
- Barge-in
- Continued attention
- No signal
8Gesture, Gaze Voice
- Range of gestural signals
- head (nod,shake), shoulder, hand, leg, foot
movements facial expressions postures
artifacts - Align with syllables
- Units phonemic clause change
- Study with recorded exchanges
9Yielding the Floor
- Turn change signal
- Offer floor to auditor/hearer
- Cues pitch fall, lengthening, but uh, end
gesture, amplitude dropuh, end clause - Likelihood of change increases with more cues
- Negated by any gesticulation
10Taking the Floor
- Speaker-state signal
- Indicate becoming speaker
- Occurs at beginning of turns
- Cues
- Shift in head direction
- AND/OR
- Start of gesture
11Retaining the Floor
- Within-turn signal
- Still speaker Look at hearer as end clause
- Continuation signal
- Still speaker Look away after within-turn/back
- Back-channel
- mmhm/okay/etc nods,
- sentence completion. Clarification request
restate - NOT a turn signal attention, agreement, confusion
12Segmenting Turns
- Speaker alone
- Within-turn signal-gtend of one unit
- Continuation signal -. Beginning of next unit
- Joint signal
- Speaker turn signal (end) auditor -gtspeaker
speaker-gtauditor - Within-turn back-channel continuation
- Back-channels signal understanding
- Early back-channel continuation
13Regaining Attention
- Gaze Disfluency
- Disfluency perturbation in speech
- Silent pause, filled pause, restart
- Gaze
- Conversants dont stare at each other constantly
- However, speaker expects to meet hearers gaze
- Confirm hearers attention
- Disfluency occurs when realize hearer NOT
attending - Pause until begin gazing, or to request attention
14Improving Human-Computer Turn-taking
- Identifying cues to turn change and turn start
- Meeting conversations
- Recorded, natural research meetings
- Multi-party
- Overlapping speech
- Units Spurts between 500ms silence
- Can predict on-line likely turn end
15Text Prosody
- Text sequence
- Modeled as n-gram language model
- Implement as HMM
- Prosody
- Duration, Pitch, Pause, Energy
- Decision trees classify probability
- Integrate LM DT
16Decision Trees
A
Xt
Xf
B
C
Ygt1
Ylt2
Ylt1
Ygt2
D
E
F
G
None
Sentence End
Sentence End
Disfluency
17Interpreting Breaks
- For each inter-word position
- Is it a disfluency, sentence end, or
continuation? - Key features
- Pause duration, vowel duration
- 62 accuracy wrt 50 chance baseline
- 90 overall
- Best combines LM DT
18Jump-in Points
- (Used) Possible turn changes
- Points WITHIN spurt where new speaker starts
- Key features
- Pause duration, low energy, pitch fall
- Accuracy 65 wrt 50 baseline
- Performance depends only on preceding prosodic
features
19Jump-in Features
- Do people speak differently when jump-in?
- Differ from regular turn starts?
- Examine only first words of turns
- No LM
- Key features
- Raised pitch, raised amplitude
- Accuracy 77 wrt 50 baseline
- Prosody only
20 Collaborative Communication
- Speaker tries to establish and add to common
ground mutual belief - Presumed a joint, collaborative activity
- Make sure mutually believe the same thing
- Hearer can acknowledge/accept/disagree
- Clark Schaeffer Degrees of grounding
- Display, Demonstrate/Reformulate,
Acknowledgement, Next relevant contribution,
Continued attention
21Computational Models
- (Traum et al) revised for computation
- Involves both speaker and hearer
- Initiate, Continue, Acknowledge, Repair, Request
Repair, etc - Common phenomena
- Back-Channel uh-huh, okay, etc
- Allows hearer to signal continued attention, ack
- WITHOUT taking the turn
- Requests for repair common in human-human
- Even more common in human-computer dialogue
22Implicature Grices Maxims
- Inferences licensed by utterances
- Grices Maxims
- Quantity Be as informative as required
- There are two classes per week not 1, or 5
- Quality Be truthful dont lie,
- Relevance Be relevant
- Manner Be perspicuous
- Dont be obscure, ambiguous, prolix, or
disorderly - Flouting maxims Consciously violate for effect
- Humor, emphasis,
23Speech Dialogue Acts
- Speech Acts (Austin, Searle)
- Doing things with words
- E.g. performatives I dub thee Sir Lancelot
- Illocutionary acts act of asking, answering,
promising, etc in saying an utterance - Include Assertives I propose to.. ,
Directives Stop that, Commissives I
promise, Expressives Thank you, Declarations
Youre fired
24Dialogue Acts
- (aka Conversational moves)
- Enriched set of speech acts
- Capture full range of conversational functions
- Adjacency pairs Many two-part structures
- E.g. Question-Answer, Greeting-Greeting,
Request-Grant, etc - Paired for speaker-hearer dyads
- Contrast with rhetorical relations in monologue
25DAMSL
- Dialogue Act Tagging framework
- Adjacency pairsgroundingrepair
- Forward looking functions
- Statement, info-request, commit, closing, etc
- Backward looking functions
- Focus on link to prior speaker utterance
- Agreement, answer, accept, etc..
26Tagged Dialogue
assert C1 . . . I need to travel in
May. inforeq,ack A1 And, what day in May did
you want to travel? assert,answer C2 OK uh I
need to be there for a meeting thats from the
12th to the 15th. inforeq,ack A2 And youre
flying into what city? assert,answerC3
Seattle. inforeq,ack A3 And what time would
you like to leave Pittsburgh? check,hold C4 Uh
hmm I dont think theres many options for
nonstop. accept,ack A4 Right. assert Theres
three non-stops today. info-req C5 What are
they? assert,open-option A5 The first one
departs PGH at 1000am arrives Seattle at 1205
their time. The second flight departs PGH at
555pm, arrives Seattle at 8pm. And the last
flight departs PGH at 815pm arrives Seattle at
1028pm. accept,ack C6 OK Ill take the 5ish
flight on the night before on the11th. check,ack
A6 On the 11th? assert,ack OK. Departing at
555pm arrives Seattle at 8pm, U.S. Air flight
115. ack C7 OK.
27Dialogue Act Recognition
- Goal Identify dialogue act tag(s) from surface
form - Challenge Surface form can be ambiguous
- Can you X? yes/no question, or info-request
- Flying on the 11th, at what time? check,
statement - Requires interpretation by hearer
- Strategies Plan inference, cue recognition
28Plan-inference-based
- Classic AI (BDI) planning framework
- Model Belief, Knowledge, Desire
- Formal definition with predicate calculus
- Axiomatization of plans and actions as well
- STRIPS-style Preconditions, Effects, Body
- Rules for plan inference
- Elegant, but..
- Labor-intensive rule, KB, heuristic development
- Effectively AI-complete
29Cue-based Interpretation
- Employs sets of features to identify
- Words and collocations Please -gt request
- Prosody Rising pitch -gt yes/no question
- Conversational structure prior act
- Example Check
- Syntax tag question ,right?
- Syntax prosody Fragment with rise
- N-gram argmax d P(d)P(Wd)
- So you, sounds like, etc
- Details later .
30From Human to Computer
- Conversational agents
- Systems that (try to) participate in dialogues
- Examples Directory assistance, travel info,
weather, restaurant and navigation info - Issues
- Limited understanding ASR errors, interpretation
- Computational costs
- broader coverage -gt slower, less accurate
31Dialogue Manager Tradeoffs
- Flexibility vs Simplicity/Predictability
- System vs User vs Mixed Initiative
- Order of dialogue interaction
- Conversational naturalness vs Accuracy
- Cost of model construction, generalization,
learning, etc - Models FST, Frame-based, HMM, BDI
- Evaluation frameworks