TurnTaking, Grounding and Speaker Segmentation - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

TurnTaking, Grounding and Speaker Segmentation

Description:

How do speakers know when it is appropriate to contribute to a conversation? ... one firefighter was injured and treated on the scene. /TEXT /BODY ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 28
Provided by: juliahir
Category:

less

Transcript and Presenter's Notes

Title: TurnTaking, Grounding and Speaker Segmentation


1
Turn-Taking, Grounding and Speaker Segmentation
  • Agustín Gravano
  • CS4705

2
Today
  • Turn-taking behaviors in human-human conversation
  • Conversational Analysis accounts
  • Task/circumstance/individual dependencies
  • Linguistic/cultural differences
  • Grounding analyses
  • Speech processing tasks
  • Online turn identification for SDS
  • Speaker diarization, segmentation, identification

3
Turn-taking Behavior
  • How do speakers know when it is appropriate to
    contribute to a conversation?
  • Conversational Analysis Theory Conversational
    partners expect certain patterns of behavior in
    normal conversation
  • Pat You got an A? Thats great!
  • Chris Yeah, Im really smart you know.
  • Chris Well, I was just lucky I happened to read
    the chapter on dialogue systems right before the
    test. Otherwise I would never have squeaked
    through.
  • General patterns in ordinary conversation
  • Deviation is significant

4
Expectations of What to Say Depend on Task at
Hand
  • Telephone
  • Openings
  • Pat Hello?
  • Chris Hi, Pat. Its Chris.
  • Pat Hi!
  • Closings (6-turn)
  • Chris Well, I just wanted to see how you were
    doing
  • Pat Thanks for calling. We'll have to have lunch
    sometime
  • Chris I'd like to
  • Pat Okay
  • Chris Okay
  • Pat See you
  • Chris Yeah, see you

5
  • Email / Chat
  • Pat Hi, can we switch lunch to 1230? Im
    running late.
  • Chris Sure. 1230.
  • Pat Great. See you.
  • Service encounters
  • Clerk Good morning. Is there something I can
    help you with?
  • Pat Hi. Yeah. I wonder if you could show me.
  • Meetings
  • Boss Today I want to focus on next years goal
    statements. Chris, could you report please.
  • Chris
  • Boss Pat, now lets hear from you
  • Pat
  • News broadcasts
  • Anchor Chris Smith reports from Rome now on the
    upcoming conclave. Chris?
  • Reporter Thanks, Pat.. And now back to Pat
    Jones in New York.

6
Conversational Analysis (Sacks et al 74)
  • Can we characterize expectations of what to say
    more generally?
  • Rules of turn-taking
  • If, during this turn the current speaker has
    selected A as the next speaker, then A must speak
    next
  • If the current speaker does not select the next
    speaker, any other speaker may take the next turn
  • If no one else takes the next turn, the current
    speaker may take the next turn
  • Rules Apply at Transition Relevance Places (TRPs)
    where something allows speaker changes to occur

7
Conversational Analysis (Sacks et al 74)
  • Adjacency pairs
  • Question/answer
  • Greeting/greeting
  • Compliment/downplayer
  • Dispreferred responses
  • Silence
  • No to a simple request without explanation
  • Changing the topic abruptly without transition
  • Important for Spoken Dialogue Systems

8
  • Developmental Psychology
  • Children learn turn-taking within first 2 years
    (Stern 74)
  • Children liked by their peers are more skilled
    (Black Hazen 90)
  • General individual differences
  • Shy people pause longer and speak less and less
    often (Pilkonis 77)
  • Schizophrenics, neurotics, depressed people less
    skilled in turn-taking

9
Cultural Differences in Turn-Taking
  • Telephone conversations
  • Openings (Zhu 04)
  • Mandarin vs. British
  • Identification differences
  • British self-report
  • Chinese callees ask the caller
  • Finnish business calls (Halmari 93) vs. American
  • Americans get right to the point
  • Finns chat

10
But where is the intent? Purpose?
11
Grounding Approaches to Conversational Modeling
  • Conversation is a joint process through which
    speakers are constantly negotiating a common
    ground (Stalnaker 78, Clark 96 inter alia)
  • Principle of Closure Agents performing an action
    require evidence that they have succeeded (Norman
    88)or not.
  • Clark Schaeffer 89
  • Presentation (by S) and Acceptance (by H) via
  • Continued attention, acknowledgement/backchannel,
    demonstration, display, relevant next
    contribution.

12
Presentation and Acceptance (Clark Schaeffer
89)
  • S John Stewart is my favorite comedian
  • H continued attention
  • H Mhmm acknowledgement/backchannel
  • H Your favorite comedian display
  • H Hes the funniest person you know
    demonstration
  • H The Daily Show is not to miss relevant next
    contribution

13
When Is It Appropriate to Speak?(Duncan 72)
  • Analyze acoustic/prosodic and gestural
    information in two face-to-face conversations.
  • Turn-yielding cues
  • Slower speaking rate
  • Drop in pitch or loudness
  • Completion of syntactic clause
  • Termination of hand gesticulation
  • Rising or falling final intonation
  • Expressions like you know.
  • Turn-keeping cues
  • Hands engaged in gesticulation
  • Filled pauses

14
When Is It Appropriate to Speak?(Beattie 82)
  • Who interrupts?
  • Less intelligent, highly neurotic, extroverted
  • Men interrupt women
  • Interruptions may indicate
  • Desire for dominance
  • Desire for social approval
  • Convey enthusiasm, involvement
  • Data 25m televised interviews before 1979
    British General election
  • Margaret Thatcher (Tory leader) the Iron Lady
  • Jim Callaghan (Prime Minister) Sunny Jim

15
  • Beatties classification scheme
  • Identify spkr 2s attempts to take the turn
  • Smooth switch spkr 1s utterance complete, turn
    to spkr 2, no simultaneous speech
  • Overlap spkr 1s utterance complete, turn to
    spkr 2, simultaneous speech
  • Simple interruption spkr 1 doesnt complete
    utterance, turn to spkr 2, simultaneous speech
  • Silent interruption spkr 1s utterance
    incomplete, turn to spkr 2, no simultaneous
    speech
  • Butting-in simultaneous speech but no change of
    turn, spkr 1 keeps the turn

16
Beattie 82 - Results
  • Thatcher is interrupted almost twice as often as
    she interrupts interviewer (19/10) unlike
    Callaghan (14/23).
  • Why is Thatcher interrupted?
  • Interruptions come at end of syntactic clause,
    when drawl on stressed syllable in clause, and
    falling intonation 3 turn-yielding cues!
  • Thatcher has fewer filled pauses (4) than
    Callaghan (22) turn-keeping cue.
  • Why does she do this?
  • Speech training before election?

17
Beattie 82 - Results
  • Public perception Thatcher is domineering in
    interviews and Callaghan is a nice guy
  • Why is she still perceived as domineering?
  • When interrupted she does not cede the floor
    despite lengthy stretches of simultaneous speech

18
Online Turn Identification for SDS
  • Push-to-talk systems
  • Silence detection
  • Not what humans do!
  • Speech detection
  • Barge-in
  • Need more natural turn-taking support
  • When are users ready to be interrupted?
  • When do they want to keep the floor?
  • When do they expect the system to backchannel?
  • How can we indicate when the system has finished
    its turn?

19
Other Dialogue Processing Tasks
  • Speaker Diarization
  • Speaker Segmentation
  • Speaker Identification (? Speaker Verification)

20
Speaker Diarization
  • Process of partitioning an input audio stream
    into homogeneous segments according to the
    speaker identity.
  • SPEECH ? segment 1 segment 2 segment 3
  • Outputs no information about the speakers
    identities.
  • Broadcast News, meetings, telephone conversations

21
Speaker Segmentation
  • Given the diarization output, cluster together
    the segments corresponding to the same speaker,
    based on acoustic features.
  • segment 1 segment 2 segment 3 segment 4
    ? segment 1 - speaker 1 segment 2 -
    speaker 2 segment 3 - speaker 1 segment 4
    - speaker 3
  • State-of-the-art 8.47 error

22
Broadcast News
  • ltDOCgt
  • ltDOCNOgt CNN19980104.1130.0034 lt/DOCNOgt
  • ltDOCTYPEgt NEWS STORY lt/DOCTYPEgt
  • ltDATE_TIMEgt 01/04/1998 113034.71 lt/DATE_TIMEgt
  • ltBODYgt
  • ltTEXTgt
  • a fire in northern kentucky is forcing 3,000
    people in two states to flee their homes.
  • the fire started early this morning at the
    cargill company plant in maysville near the
  • ohio river.
  • authorities have been going door-to-door advising
    people in kentucky and ohio
  • to take shelter in area high schools.
  • the fire is in a building where several
    fertilizers and chemicals are stored.
  • officials say all they can do is let the fire
    burn itself out, because spraying
  • water on the flames would be too dangerous.
  • at the current time, our only way of getting it
    under control is to stay away from it.
  • we've backed everyone off from the fire by about
    a mile and a quarter and evacuated
  • homes in that radius and the chief threat at this
    point is a very small risk of a very
  • large explosion caused by 400 tons of ammonia
    nitrate stored in the building.
  • four people have been taken to hospitals.

23
Speaker Identification
  • Problem of identifying a person solely from their
    speech.
  • Not the same as speaker verification (verifying
    whether the speaker is who they claim to be).
  • Linguistic information to identify speaker types
    and speaker names on Broadcast News data (LIMSI
    04)
  • Templates (ltnamegt has this report from
    ltlocationgt)
  • Results 10.9 error on test set
  • But only 10 of segments contain relevant
    patterns
  • Estimate 25 error on Broadcast News if speaker
    clustering is done to identify all of each
    speakers segments

24
  • ltDOCgt
  • ltDOCNOgt CNN19980104.1130.0108 lt/DOCNOgt
  • ltDOCTYPEgt NEWS STORY lt/DOCTYPEgt
  • ltDATE_TIMEgt 01/04/1998 113148.11 lt/DATE_TIMEgt
  • ltBODYgt
  • ltTEXTgt
  • unexpected weather conditions are the rule across
    much of the united states
  • this weekend.
  • angela astore reports.
  • ltTURNgt
  • it was a nice day to play along the beach --
    spend a few hours fishing --
  • or get in a game of golf -- not uncommon --
    unless it's january in chicago.
  • record high temperatures were set yesterday from
    minnesota to massachusetts.
  • warm air drawn northward from the gulf of mexico
    was behind the rise in the mercury.
  • it was a different scene in the northwest, where
    snow is the story.
  • but the winter weather didn't stop this man from
    getting in some warmer pursuits.
  • and he wasn't bothered by the fact that he
    couldn't see where his golf balls landed.
  • ltTURNgt
  • it's not really where it's going to land that's
    important at this point

25
Conclusions
  • Turn-taking models and theories of grounding of
    considerable potential use in SDS.
  • What is the User likely to say next, and when?
  • What type of response does s/he expect the system
    to make? When?
  • Obstacles for practical use
  • What cues signal when it is appropriate to speak?
  • How do we negotiate a common system/user ground?

26
  • Questions?

27
  • Happy Thanksgiving!
Write a Comment
User Comments (0)
About PowerShow.com