Spoken Dialogue Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Spoken Dialogue Systems

Description:

Spoken Dialogue Systems – PowerPoint PPT presentation

Number of Views:178
Avg rating:3.0/5.0
Slides: 24
Provided by: juliah162
Category:

less

Transcript and Presenter's Notes

Title: Spoken Dialogue Systems


1
  • Spoken Dialogue Systems

2
Talking to a Machine.and (often) Getting an
Answer
  • Todays spoken dialogue systems make it possible
    to accomplish real tasks without talking to a
    person
  • Could Eliza do this?
  • What do todays systems do better?
  • Do they actually embody human intelligence?
  • Key advances
  • Stick to goal-directed interactions in a limited
    domain
  • Prime users to adopt the vocabulary you can
    recognize
  • Partition the interaction into manageable stages
  • Judicious use of system vs. mixed initiative

3
Dialogue vs. Monologue
  • Monologue and dialogue both involve interpreting
  • Information status
  • Coherence issues
  • Reference resolution
  • Speech acts, implicature, intentionality
  • Dialogue involves managing
  • Turn-taking
  • Grounding and repairing misunderstandings
  • Initiative and confirmation strategies

4
Segmenting Speech into Utterances
  • What is an utterance?
  • Why is EOU detection harder than EOS?
  • How does speech differ from text?
  • Single syntactic sentence may span several turns
  • A We've got you on USAir flight 99
  • B Yep
  • A leaving on December 1.
  • Multiple syntactic sentences may occur in single
    turn
  • A We've got you on USAir flight 99 leaving on
    December. Do you need a rental car?
  • Intonational definitions intonational phrase,
    breath group, intonation unit

5
Turns and Utterances
  • Dialogue is characterized by turn-taking who
    should talk next, and when they should talk
  • How do we identify turns in recorded speech?
  • Little speaker overlap (around 5 in English
    --although depends on domain)
  • But little silence between turns either
  • How do we know when a speaker is giving up or
    taking a turn? Holding the floor? How do we
    know when a speaker is interruptable?

6
Simplified Turn-Taking Rule (Sacks et al)
  • At each transition-relevance place (TRP) of each
    turn
  • If current speaker has selected A as next
    speaker, then A must speak next
  • If current speaker does not select next speaker,
    any other speaker may take next turn
  • If no one else takes next turn, the current
    speaker may take next turn
  • TRPs are where the structure of the language
    allows speaker shifts to occur

7
  • Adjacency pairs set up next speaker expectations
  • GREETING/GREETING
  • QUESTION/ANSWER
  • COMPLIMENT/DOWNPLAYER
  • REQUEST/GRANT
  • Significant silence is dispreferred
  • A Is there something bothering you or not?
    (1.0s)
  • A Yes or no? (1.5s)
  • A Eh?
  • B No.

8
Intonational Cues to Turntaking
  • Continuation rise (L-H) holds the floor
  • H-H requests a response
  • LH-H (ynq contour)
  • H H-H (highrise question contour)
  • Intonational contours signal dialogue acts in
    adjacency pairs

9
Timing and Turntaking
  • How should we time responses in a SDS?
  • Japanese studies of aizuchi (backchannels) (Koiso
    et al 98, Takeuchi et al 02) in natural speech
  • Lexical information particles ne and ka ending
    preceding turn or (in telephone shopping) product
    names
  • Length of preceding utterance, f0, loudness, and
    pause after even more important in predicting
    turntaking

10
Turntaking and Initiative Strategies
  • System Initiative
  • S Please give me your arrival city name.
  • U Baltimore.
  • S Please give me your departure city name.
  • User Initiative
  • S How may I help you?
  • U I want to go from Boston to Baltimore on
    November 8.
  • Mixed initiative
  • S How may I help you?
  • U I want to go to Boston.
  • S What day do you want to go to Boston?

11
Grounding (Clark Shaefer 89)
  • Conversational participants dont just take turns
    speaking.they try to establish common ground (or
    mutual belief)
  • Hmust ground a S's utterances by making it clear
    whether or not understanding has occurred
  • How do hearers do this?
  • S I can upgrade you to an SUV at that rate.
  • Continued attention
  • (U gazes appreciatively at S)
  • Relevant next contribution
  • U Do you have a RAV4 available?

12
  • Acknowledgement/backchannel
  • U Ok/Mhmmm/Great!
  • Demonstration/paraphrase
  • U An SUV.
  • Display/repetition
  • U You can upgrade me to an SUV at the same rate?
  • Request for repair
  • U I beg your pardon?

13
Detecting Grounding Behavior
  • Evidence of system misconceptions reflected in
    user responses (Krahmer et al 99, 00)
  • Responses to incorrect verifications
  • contain more words (or are empty)
  • show marked word order (especially after implicit
    verifications)
  • contain more disconfirmations, more
    repeated/corrected info
  • No after incorrect verifications vs. other
    ynqs
  • has higher boundary tone
  • wider pitch range
  • longer duration
  • longer pauses before and after
  • more additional words after it

14
  • User information state reflected in response
    (Shimojima et al 99, 01)
  • Echoic responses repeat prior information as
    acknowledgment or request for confirmation
  • S1 Then go to Keage station.
  • S2 Keage.
  • Experiment
  • Identify degree of integration and prosodic
    features (boundary tone, pitch range, tempo,
    initial pause)
  • Perception studies to elicit integration effect
  • Results fast tempo, little pause and low pitch
    signal high integration

15
Grounding and Confirmation Strategies
  • U I want to go to Baltimore.
  • Explicit
  • S Did you say you want to go to Baltimore?
  • Implicit
  • S Baltimore. (H L- L)
  • S Baltimore? (L H- H)
  • S What time do you want to leave Baltimore?
  • No confirmation

16
How do we evaluate Dialogue Systems?
  • PARADISE framework (Walker et al 00)
  • Performance of a dialogue system is affected
    both by what gets accomplished by the user and
    the dialogue agent and how it gets accomplished

Maximize Task Success
Minimize Costs
Efficiency Measures
Qualitative Measures
17
What metrics should we use?
  • Efficiency of the InteractionUser Turns, System
    Turns, Elapsed Time
  • Quality of the Interaction ASR rejections, Time
    Out Prompts, Help Requests, Barge-Ins, Mean
    Recognition Score (concept accuracy),
    Cancellation Requests
  • User Satisfaction
  • Task Success perceived completion, information
    extracted

18
User SatisfactionSum of Many Measures
  • Was Annie easy to understand in this
    conversation? (TTS Performance)
  • In this conversation, did Annie understand what
    you said? (ASR Performance)
  • In this conversation, was it easy to find the
    message you wanted? (Task Ease)
  • Was the pace of interaction with Annie
    appropriate in this conversation? (Interaction
    Pace)
  • In this conversation, did you know what you could
    say at each point of the dialog?
  • (User Expertise)
  • How often was Annie sluggish and slow to reply to
    you in this conversation? (System Response)
  • Did Annie work the way you expected her to in
    this conversation? (Expected Behavior)
  • From your current experience with using Annie to
    get your email, do you think you'd use Annie
    regularly to access your mail when you are away
    from your desk? (Future Use)

19
Performance Model
  • Weights trained for each independent factor via
    multiple regression modeling how much does each
    contribute to User Satisfaction?
  • Result useful for system development
  • Making predictions about system modifications
  • Distinguishing good dialogues from bad
    dialogues
  • But can we also tell on-line when a dialogue is
    going wrong

20
Identifying Misrecognitions, Awares and User
Corrections Automatically (Hirschberg, Litman
Swerts)
  • Collect corpus from interactive voice response
    system
  • Identify speaker turns
  • incorrectly recognized
  • where speakers first aware of error
  • that correct misrecognitions
  • Identify prosodic features of turns in each
    category and compare to other turns
  • Use Machine Learning techniques to train a
    classifier to make these distinctions
    automatically

21
Turn Types
TOOT Hi. This is ATT Amtrak Schedule System.
This is TOOT. How may I help you? User Hello.
I would like trains from Philadelphia to New York
leaving on Sunday at ten thirty in the evening.
TOOT Which city do you want to go to? User
New York.
misrecognition
correction
aware site
22
Results
  • Reduced error in predicting misrecognized turns
    to 8.64
  • Error in predicting awares (12)
  • Error in predicting corrections (18-21)

23
Conclusions
  • Spoken dialogue systems presents new problems --
    but also new possibilities
  • Recognizing speech introduces a new source of
    errors
  • Additional information provided in the speech
    stream offers new information about users
    intended meanings, emotional state (grounding of
    information, speech acts, reaction to system
    errors)
  • Why spoken dialogue systems rather than web-based
    interfaces?
Write a Comment
User Comments (0)
About PowerShow.com