Discourse

About This Presentation

Title:

Discourse

Description:

John has a St. Bernard and Mary has a Yorkie. ... Mary has a Yorkie. They arouse some comment when they walk them in the park. Generics ... – PowerPoint PPT presentation

Number of Views:238

Avg rating:3.0/5.0

Slides: 80

Provided by: davidgos

Category:

more less

Transcript and Presenter's Notes

Title: Discourse

1
Discourse
2
Today

Homework 3
Final Project
Word Sense Disambiguation
Improving queries
Information retrieval tasks
Discourse
Pronoun Resolution
Discourse Cohesion

3
Homework 3 (HW 3)

Instead of POS tagging, your task is to do word
sense disambiguation.
1. Using
The following training set from the Brown corpus
(/home/classes/guinnc/475/hw3/trainingset.txt)
and
The definition of run given on WordNet
(http//wordnet.princeton.edu/perl/webwn)
2. Determine the preferred meaning of each use of
the word run or runs in the trainingset.txt
214 uses
Due Wednesday, March 14
3. Using any algorithm or approach described in
our book, write code that will have the computer
automatically determine the appropriate meaning
sense of the word run or runs within the
context of a sentence.
Apply this algorithm to the training set and see
how well you do
Due Friday, March 23
4. A new test corpus with previously unseen data
will be provided to you.
Apply your algorithm (unaltered) to this test set
and see how it does
Make changes that might improve your algorithm on
this test set
Due Wednesday, March 28
Teams
Chris Tripp/Andrew Martin, Tom Starr/Jerry
Martin, Matt Singletary/Matt Ratliff, Bill
Shipman/Andrew Cotton, Dan Reeves/Ross
Cranford, Allen Rawls/Ralph Harris, Bret
Mohler/Rose Rahiminejad/Jason Forsythe

4
Final Project

Software Project Written Project Description
Due Date 04/30. Everything!
Stages
Project Description 03/28
Background Reading 04/09
Progress Report 04/18
Final Submission 04/30

5
Project Ideas

Part-of-speech tagger
Noun phrase identifier
Word sense disambiguation
Text categorization
Pronoun anaphora resolution
Robust question/answer system
Translation
CYC (common sense KB and NLP system, ask Bill
Shipman all about it)
NLTK (Natural Language Toolkit)
http//nltk.sourceforge.net/
Other?

6
Project Description (03/28)

What you are doing
The specific methods you will use
How you will test your system
Treat this as a formal contract. Dont just
scribble something off.
You and I can iterate but do so before 03/28.

7
Background Reading (04/09)

Even if you dont implement the techniques you
read about, you should become an expert in the
area you are working in
5 Journal or Conference articles as close to your
research topic and methods as possible
You will submit the citation for each journal or
conference article and your own original review
of the article.
In particular, summarize the thesis, technique(s)
employed, and evaluation.
What techniques and ideas are particularly
relevant to your project?

8
Progress Report (04/18)

A contractual listing of
Items accomplished (and dates of completion)
Items left to be done (and expected dates of
completion)
Difficulties encountered
Expected solution or work-around to the
difficulty

9
Final Submission (04/30)

Submit
Actual Code and instructions for how to compile
and run it
A document describing your project, the
techniques you used, your evaluation and results.

10
When last we left .

Information Retrieval
How do search engines do it?
How can they be made better?

11
Evaluating IR Performance

Precision relevant docs returned/total docs
returned -- how often are you right when you say
this document is relevant?
Recall relevant docs returned/relevant docs in
collection -- how many of the relevant documents
do you find?
F-measure combines P and R
Are P and R equally important?

12
Improving Queries

Relevance feedback users rate retrieved docs
Query expansion many techniques
add top N docs retrieved to query and resubmit
expanded query
WordNet
Term clustering cluster rows of terms in
term-by-document matrix to produce synonyms and
add to query

13
IR Tasks

Ad hoc retrieval normal IR
Routing/categorization assign new doc to one of
predefined set of categories
Clustering divide a collection into N clusters
Segmentation segment text into coherent chunks
Summarization compress a text by extracting
summary items or eliminating less relevant items
Question-answering find a span of text (within
some window) containing the answer to a question

14
Information Extraction

Another robust alternative
Idea extract particular types of information
from arbitrary text or transcribed speech
Examples
Named entities people, places, organizations,
times, dates
MIPS Vice President
John Hime
MUC evaluations
Domains Medical texts, broadcast news (terrorist
reports),

15
Reference Resolution Example

Gracie Oh yeah ... and then Mr. and Mrs. Jones
were having matrimonial trouble, and my brother
was hired to watch Mrs. Jones.
George Well, I imagine she was a very
attractive woman.
Gracie She was, and my brother watched her day
and night for six months.
George Well, what happened?
Gracie She finally got a divorce.
George Mrs. Jones?
Gracie No, my brother's wife.

16
Some Terminology

Discourse anything longer than a single
utterance or sentence
Monologue
Dialogue
May be multi-party
May be human-machine

17
Reference Resolution

Process of associating Bloomberg/he/his with
particular person and big budget problem/it with
a concept
Guiliani left Bloomberg to be mayor of a city
with a big budget problem. Its unclear how
hell be able to handle it during his term.
Referring expressions Guilani, Bloomberg, he,
it, his
Referents the person named Bloomberg, the
concept of a big budget problem

Co-referring referring expressions Bloomberg,
he, his
Antecedent Bloomberg
Anaphors he, his

19
Discourse Model

Needed because referring expressions (e.g.
Guiliani, Bloomberg, he, it budget problem)
encode information about beliefs about the
referent
When a referent is first mentioned in a
discourse, a representation is evoked in the
model
Information predicated of it is stored also in
the model
On subsequent mention, it is accessed from the
model

20
Types of Reference

Entities, concepts, places, propositions, events,
...
According to John, Bob bought Sue an Integra, and
Sue bought Fred a Legend.
But that turned out to be a lie. (a speech act)
But that was false. (proposition)
That struck me as a funny way to describe the
situation. (manner of description)
That caused Sue to become rather poor. (event)
That caused them both to become rather poor.
(combination of multiple events)

21
Reference Phenomena

Indefinite noun phrases
A homeless man hit up Bloomberg for a dollar.
Some homeless guy hit up Bloomberg for a dollar.
This homeless man hit up Bloomberg for a dollar.
Definite noun phrases
The poor fellow only got a lecture.
Demonstratives
This homeless man got a lecture but that one got
carted off to jail.

One-anaphora
Clinton used to have a dog called Buddy. Now
hes got another one.

23
Pronouns

A large tiger escaped from the Central Park zoo
chasing a tiny sparrow. It was recaptured by a
brave policeman.
Referents of pronouns require some degree of
salience in the discourse model (as opposed to
definite and indefinite NPs, e.g.)
How do items become salient in discourse?

24
Salience via Simple Recency

He had dodged the press for 36 hours, but
yesterday the Buck House Butler came out of the
cocoon of his room at the Millennium Hotel in New
York and shoveled some morsels the way of the
panting press. First there was a brief, if
obviously self-serving, statement, and then, in
good royal tradition, a walkabout.

25
Salience via Structural Recency

E So you have the engine assembly finished. Now
attach the rope. By the way, did you buy the gas
can today?
A Yes.
E Did it cost much?
A No.
E OK, good. Have you got it attached yet?

26
Inferables

I almost bought an Acura Integra today, but a
door had a dent and the engine seemed noisy.
Mix the flour, butter, and water. Knead the dough
until smooth and shiny.

27
Discontinuous Sets

Entities evoked together but mentioned in
different sentence or phrases
John has a St. Bernard and Mary has a Yorkie.
They arouse some comment when they walk them in
the park.
John has a St. Bernard. Mary has a Yorkie. They
arouse some comment when they walk them in the
park.

28
Generics

I saw two Corgis and their seven puppies today.
They are the funniest dogs!

29
Constraints on Coreference

Number agreement
Johns parents like opera. John hates it/John
hates them.
Person and case agreement
Nominative I, we, you, he, she, they
Accusative me,us,you,him,her,them
Genitive my,our,your,his,her,their
George and Edward brought bread and cheese. They
shared them.

Gender agreement
John has a Porsche. He/it/she is attractive.
Syntactic constraints binding theory
John bought himself a new Volvo. (himself John)
John bought him a new Volvo (him not John)
Selectional restrictions
John left his plane in the hangar.
He had flown it from Memphis this morning.

31
Pronoun Interpretation Preferences

Recency
John bought a new boat. Bill bought a bigger
one. Mary likes to sail it.
Butgrammatical role raises its ugly head
John went to the Acura dealership with Bill. He
bought an Integra.
Bill went to the Acura dealership with John. He
bought an Integra.
?John and Bill went to the Acura dealership. He
bought an Integra.

And so doesrepeated mention
John needed a car to go to his new job. He
decided that he wanted something sporty. Bill
went to the dealership with him. He bought a
Miata.
Who bought the Miata?
What about grammatical role preference?
Parallel constructions
Saturday, Mary went with Sue to the farmers
market.
Sally went with her to the bookstore.
Sunday, Mary went with Sue to the mall.
Sally told her she should get over her shopping
obsession.

Verb semantics/thematic roles
John telephoned Bill. Hed lost the directions
to his house.
John criticized Bill. Hed lost the directions
to his house.

34
Pragmatics

Context-dependent meaning
Jeb Bush was helped by his brother and so was
Frank Lautenberg. (Strict vs. Sloppy)
Mike Bloomberg bet George Pataki a baseball cap
that he could/couldnt run the marathon in under
3 hours.
Mike Bloomberg bet George Pataki a baseball cap
that he could/couldnt be hypnotized in under 1
minute.

35
Sum What Factors Affect Reference Resolution?

Lexical factors
Reference type Inferability, discontinuous set,
generics, one anaphora, pronouns,
Discourse factors
Recency
Focus/topic structure, digression
Repeated mention
Syntactic factors
Agreement gender, number, person, case
Parallel construction
Grammatical role

Selectional restrictions
Semantic/lexical factors
Verb semantics, thematic role
Pragmatic factors

37
Anaphora resolution

Finding in a text all the referring expressions
that have one and the same denotation
Pronominal anaphora resolution
Anaphora resolution between named entities
Full noun phrase anaphora resolution

38
Reference Resolution

Given these types of constraints, can we
construct an algorithm that will apply them such
that we can identify the correct referents of
anaphors and other referring expressions?

39
Issues

Which constraints/features can/should we make use
of?
How should we order them? i.e. which override
which?
What should be stored in our discourse model?
I.e., what types of information do we need to
keep track of?
How to evaluate?

40
Three Algorithms

Lappin Leas 94 weighting via recency and
syntactic preferences
Hobbs 78 syntax tree-based referential search
Centering (Grosz, Joshi, Weinstein, 95 and
various) discourse-based search

41
Lappin Leass 94

Weights candidate antecedents by recency and
syntactic preference (86 accuracy)
Two major functions to perform
Update the discourse model when an NP that evokes
a new entity is found in the text, computing the
salience of this entity for future anaphora
resolution
Find most likely referent for current anaphor by
considering possible antecedents and their
salience values

42
Saliency Factor Weights

Sentence recency (in current sentence?) 100
Subject emphasis (is it the subject?) 80
Existential emphasis (existential predicate
nominal?) 70
Accusative emphasis (is it the direct object?) 50
Indirect object/oblique complement emphasis 40
Non-adverbial emphasis (not in PP) 50
Head noun emphasis (is head noun) 80

Implicit ordering of arguments
Subject existential predicate nominal object
indirect object or oblique adverbial PP
On the sofa, the cat was eating bonbons.
sofa 10080180
cat 100805080310
bonbons 100505080280
Update
Weights accumulate over time
Cut in half after each sentence processed
Salience values for subsequent referents
accumulate for equivalence class of
co-referential items (exceptions, e.g. multiple
references in same sentence)

The bonbons were clearly very tasty.
sofa 180/290
cat 310/2155
bonbons 280/2 (100805080)450
Additional salience weights for grammatical role
parallelism (35) and cataphora (-175) calculated
when pronoun to be resolved
Additional constraints on gender/number
agrmt/syntax
They were a gift from an unknown admirer.
sofa 90/245
cat 155/277.5
bonbons 450/2225 (35) 260.

45
Reference Resolution

Collect potential referents (up to four sentences
back) sofa,cat,bonbons
Remove those that dont agree in number/gender
with pronoun bonbons
Remove those that dont pass intra-sentential
syntactic coreference constraints
The cat washed it. (it?cat)
Add applicable values for role parallelism (35)
or cataphora (-175) to current salience value for
each potential antecedent
Select referent with highest salience if tie,
select closest referent in string

46
Text Coherence

Example
(1) John hid Bills car keys.
(2) He was drunk.
(1) John hid Bills car keys.
(2) He likes junk food.
(1) George Bush supports big business.
(2) Hes sure to veto House Bill 1711.
Hearers try to find connections between
utterances in a discourse.
The possible connections between utterances can
be specified as a set of coherence relations.

47
Coherence relations (Hobbs,1979)

Result S0 causes S1
John bought an Acura. His father went ballistic.
Explanation S1 causes S0.
John hid Bills car keys. He was drunk.
Parallel S0 and S1 are parallel.
John bought an Acura. Bill bought a BMW.
Elaboration S1 is an elaboration of S0.
John bought an Acura this weekend. He purchased
it for 40 thousand dollars.

48
Discourse structure

S1 John took a train to Bills car dealership.
S2 He needed to buy a car.
S3 The company he works for now isnt near any
public transportation.
S4He also wanted to talk to Bill about their
softball leagues.

Explanation
49
Discourse structure

S1 John took a train to Bills car dealership.
S2 He needed to buy a car.
S3 The company he works for now isnt near any
public transportation.
S4He also wanted to talk to Bill about their
softball leagues.

Explanation
Parallel
50
Discourse structure

S1 John took a train to Bills car dealership.
S2 He needed to buy a car.
S3 The company he works for now isnt near any
public transportation.
S4He also wanted to talk to Bill about their
softball leagues.

Explanation

Explanation
Parallel
51
Discourse parsing
Explanation (e1)
S1 (e1)
Parallel (e2e4)
Explanation (e2)
S4 (e4)
S2(e2)
S3(e3)
52
Why compute discourse structure?

Natural language understanding
Summarization
Information retrieval
Natural language Generation
Reference resolution

53
Two theories on discourse structure

Mann and Thompsons Rhetorical structure theory
(1988)
Grosz and Sidners attention, intention and
structure of discourse (1986)

54
Rhetorical structure theory (RST)

Mann and Thompson (1988)
One theory of discourse structure, based on
identifying relations between parts of the text
Defined 20 rhetorical relations
Presentational relations intentional
Subject matter relations informational
Nucleus central segment of text
Satellite more peripheral segment
Relation definitions and more.

55
Presentational relations

Those whose intended effect is to increase some
inclination in the hearer.
Relations
Antithesis -
Justify
Background - Motivation
Concession -
Preparation
Enablement -
Restatement
Evidence - Summary

56
Subject matter relations

Those whose intended effect is that the hearer
recognize the relation in question.
Relations
Circumstance -
Otherwise
Condition -
Purpose
Elaboration -
Solutionhood
Evaluation -
Unconditional
Interpretation -
Unless
Means -
Volitional cause
Non-volitional cause -
Volitional result
Non-volitional result

57
Multinuclear relations

Contrast
Joint
List
Multinuclear restatement
Sequence

58
Some examples

Explanation John went to the coffee shop. He was
sleepy.
Elaboration John likes coffee. He drinks it
every day.
Contrast John likes coffee. Mary hates it.

59
Discourse structure
John likes coffee
They argue a lot
contrast
cause
elaboration
Mary hates coffee.
He drinks it every day
60
A relation Evidence

(a) George Bush supports big business.
(b) Hes sure to veto House Bill 1711.
Relation Name Evidence
Constraints on Nucl H might not believe Nucl to
a degree satisfactory to S.
Constraints on Sat H believes Sat or will find
it credible
Constraints on NuclSat Hs comprehending Sat in
Sat increases Hs belief of Nucl.
Effect Hs belief of Nucl is increased.

61
A relation Volitional-Cause

(a) George Bush supports big business.
(b) Hes sure to veto House Bill 1711.
Relation Name Volitional-Cause
Constraints on Nucl presents a volitional action
Constraints on Sat none.
Constraints on NuclSat Sat presents a situation
that could have caused the agent of the
volitional action in Nucl to perform the action.
Effect H recognizes the situation presented in
Sat as a cause for the volitional action
presented in Nucl.

62
Another example

S (a) Come home by 500. (b) Then we can go to
the hardware store before it closes. (c) That way
we can finish the bookshelves tonight.
(a)
(a) (b)
(c)

motivation
motivation
(b)
(c)
condition
condition
63
Problems with RST (Moore Pollack, 1992)

How many rhetorical relations are there?
How can we use RST in dialogues?
How do we incorporate speaker intentions into
RST?
RST does not allow for multiple relations between
parts of a discourse informational and
intentional levels must coexist.

64
Grosz Sidner (1986)
65
Grosz and Sidner (1986)

A leading theory of discourse structure
Three components
A linguistic structure
An intentional structure
An attentional state

66
Linguistic structure

The structure of the sequence of utterances that
comprises a discourse.
Utterances form Discourse Segment (DS) and a
discourse is made up of embedded DSs.
What exactly is a DS?
Any evidence that humans naturally recognize
segment boundaries?
Do humans agree on segment boundaries?
How to find the boundaries automatically?

67
Intentional structure

Speakers in a discourse may have many intentions
public or private.
Discourse purpose (DP) the intention that
underlies engaging in a discourse.
Discourse segment purpose (DSP) the purpose a
DS. How this segment contributes to achieving the
overall DP?
Two relations between DSPs
Dominance if DSP1 contributes to DSP2, we say
DSP2 dominates DSP1.
Satisfaction-precedence DSP1 must be satisfied
before DSP2.

68
Attentional State

The attentional state is an abstraction of the
participants focus of attention as their
discourse unfolds.
The state is a stack of focus spaces.
A focus space (FS) is associated with a DS, and
it contains DSP and objects, properties, and
relations salient in the DS.
When a DS ends, its FS is popped.
When a DS starts, its FS is pushed onto the stack.

69
An example
DS1

C1 I need to travel in May.
A1 And, what day in May do you want
to travel?
C2 I need to be there for a meeting on 15th.
A2 And you are flying into what city?
C3 Seattle.
A3 And what time would you like to
leave Pittsburgh?
C4 Hmm. I dont think there are many
options for non-stop.
A4 There are three non-stops today?
C5 What are they?
.

DS2
DS0
DS3
DS4
DS5
70
Discourse structure with intention info
DS0
DS1
DS3
DS4
DS2
DS5
A1-C2
A2-C3
C1
A3
C4-C7

I0 C wants A to find a flight for C
I1 C wants A to know that C is traveling in May.
I2 A wants to know the departure data
I3 A wants to know the destination
I4 A wants to know the departure time
I5 C wants A to find a nonstop flight

71
Problems with GS 1986

Assume that discourses are task-oriented
Assume there is a single, hierarchical structure
shared by speaker and hearer
Do people really build such structures when they
speak? Do they use them in interpreting what
others say?

72
Building discourse structure
73
Tasks