Title: Discourse Structure in Generation
1Discourse Structure in Generation
2Today
- Models of Discourse Structure
- Do we have them?
- Grosz Sidner 86
- What identifies discourse structure to Hearers?
- Textual cues
- Spoken cues
- How can we produce appropriate discourse
structure in TTS systems? - Can we identify discourse structure
automatically, from speech?
3Is there structure in this discourse?
- A beautiful mallard spotted the dove I was
feeding. - The duck dove supply is small this year.
- That dove was history in a minute.
- Well, to recover from this horrible scene, I went
to the park snack bar for a cup of cocoa. - To my surprise, I ran into a friend from back
home. - When I told her of my recent experience she
questioned my sanity.
4Is this a reasonable structure?
- A beautiful mallard spotted the dove I was
feeding. - The duck dove supply is small this year.
- That dove was history in a minute.
- Well, to recover from this horrible scene, I went
to the park snack bar for a cup of cocoa. - To my surprise, I ran into a friend from back
home. - When I told her of my recent experience she
questioned my sanity.
5This?
- A beautiful mallard spotted the dove I was
feeding. - The duck dove supply is small this year.
- That dove was history in a minute.
- Well, to recover from this horrible scene, I went
to the park snack bar for a cup of cocoa. - To my surprise, I ran into a friend from back
home. - When I told her of my recent experience she
questioned my sanity.
6This?
- A beautiful mallard spotted the dove I was
feeding. - The duck dove supply is small this year.
- That dove was history in a minute.
- Well, to recover from this horrible scene, I went
to the park snack bar for a cup of cocoa. - To my surprise, I ran into a friend from back
home. - When I told her of my recent experience she
questioned my sanity.
7What information do we use in segmenting a
discourse?
- Topic coherence?
- Repeated reference?
- Cue phrases?
- ????
8Structures of Discourse Structure (Grosz Sidner
86)
- A leading theory of discourse structure
- Based upon Speaker intentions and Speaker and
Hearer attentional state - Identifies a few, general relations that hold
among Speaker intentions - Identifies a model of attentional state
- Three components
- Linguistic structure
- Intentional structure
- Attentional structure
9Linguistic Structure
- What is actually said or written
- How is the linguistic structure represented?
- Assume discourse is segmented into Discourse
Segments (DS) - What is the basic unit of analysis?
- Do we all segment alike?
- Do we all use the same cues?
10Linguistic Structure of Discourse D
- S1 A beautiful mallard spotted the dove I was
feeding. - The duck dove supply is small this year.
- That dove was history in a minute.
- S2 Well, to recover from this horrible scene, I
went to the park snack bar for a cup of cocoa. - To my surprise, I ran into a friend from back
home. - When I told her of my recent experience she
questioned my sanity.
11Intentional Structure
- Discourse purpose (DP) basic purpose of the
Speaker in producing the discourse - Discourse segment purposes (DSPs) the Speakers
purpose in producing the segment - Segments are related to one another by their
purposes - Satisfaction-precedence DSP1 must be satisfied
before DSP2 - Dominance DSP1 dominates DSP2 if fulfilling
DSP2 constitutes part of fulfilling DSP1
12Linguistic Structure of Discourse D
- DSP1 Describe murder of dove by duck.
- S1 A beautiful mallard spotted the dove I was
feeding. - The duck dove supply is small this year.
- That dove was history in a minute.
- DSP2 Describe meeting of old friend.
- S2 Well, to recover from this horrible scene, I
went to the park snack bar for a cup of cocoa. - To my surprise, I ran into a friend from back
home. - When I told her of my recent experience she
questioned my sanity.
13- DSP2 Describe recovery process.
- S2
- DSP3 Describe snack
- S3 Well, to recover from this horrible scene, I
went to the park snack bar for a cup of cocoa. - DSP3 Describe meeting old friend.
- S4 To my surprise, I ran into a friend from back
home. - DSP5 Describe friends reaction
- S5 When I told her of my recent experience she
questioned my sanity.
14Attentional State The Focus Stack
- Stack of focus spaces, each containing objects,
properties and relations salient during each DS,
plus the DSP - State changes transition rules controlling the
addition/deletion of focus spaces - Information at lower levels may or may not be
available at higher levels - Focus spaces are pushed onto the stack when
- A new DS is begun
15- An embedded DS (e.g. a DS dominated by another
DS) is begun - Focus spaces are popped when they are completed
- State of focus stack models felicitous reference,
coherence in discourse
S2 DSP2, scene, Speaker, snack_bar Cocoa,
friend, home,sanity
S1 DSP1, duck, dove, Speaker, duck_dove_supply
16Limits of the Theory
- Assumes discourses are task-oriented
- Assumes a single, hierarchical structure shared
by S and H - Questions
- Do people really build such structures when they
converse? - Use them in interpreting what others say?
- How could they do it?
17How might people recognize discourse structure?
- Linguistic markers?
- tense and aspect
- cue phrases
- Inference of Speaker intentions?
- Inference from task structure?
- Intonational Information?
18Acoustic and Prosodic Cues to Discourse Structure
- Intuition
- Speakers vary acoustic and prosodic cues to
convey variation in discourse structure - Systematic? In read or spontaneous speech?
- Evidence
- Observations from recorded corpora
- Laboratory experiments
- Machine learning of discourse structure from
acoustic/prosodic features
19Prosodic Correlates of Discourse/Topic Structure
- Pitch range
- Lehiste 75, Brown et al 83, Silverman 86,
Avesani Vayra 88, Ayers 92, Swerts et al 92,
Grosz Hirschberg92, Swerts Ostendorf 95,
Hirschberg Nakatani 96 - Preceding pause
- Lehiste 79, Chafe 80, Brown et al 83,
Silverman 86, Woodbury 87, Avesani Vayra 88,
Grosz Hirschberg92, Passoneau Litman 93,
Hirschberg Nakatani 96
20- Rate
- Butterworth 75, Lehiste 80, Grosz
Hirschberg92, Hirschberg Nakatani 96 - Amplitude
- Brown et al 83, Grosz Hirschberg92,
Hirschberg Nakatani 96 - Contour
- Brown et al 83, Woodbury 87, Swerts et al 92
21Issues
- Do we find significant and reliable cues to
discourse structure in prosodic variation - When tested against an independent theory of
discourse structure? - In spontaneous as well as read speech?
- Are Hearers interpretations of discourse
structure influenced by intonational variation?
22Grosz Hirschberg 92
- Small corpus of read AP newswire
- Read by professional speaker
- Labeled for discourse structure from text alone
or from text and speech - Pre-ToBI labeled
- Acoustic-prosodic features extracted for each
intermediate (level 3) phrase - Pitch range and change from prior phrase
- Intensity (rms) and change in db from prior
phrase - Preceding and subsequent pause
- Speaking rate
23- Analysis of phrases in different segment
positions SBEG, SF, parentheticals, quoted
speech - ANOVAs and t-tests on means
- Results
- Direct quotes larger pitch range
- Parentheticals smaller range, neg change from
prior phrase, neg change in db, faster rate - SBEG larger range, louder, greater preceding
pause, less subsequent pause - SF greater subsequent pause
24- Machine learning experiments identified
- SBEG with 91.5 est. accuracy (x-validation)
- SF, 92.5
- Attributive tags, 96.9
- Direct quotations, 86.4
- Indirect quotations, 88.5
- Parentheticals, 89.2
- Conclusion Acoustic/prosodic information is
available to permit Hearers to identify discourse
structure
25Next
- The midterm
- Closed book, no notes or electronic devices
- Will include material through today