Discourse - PowerPoint PPT Presentation

1 / 79
About This Presentation
Title:

Discourse

Description:

John has a St. Bernard and Mary has a Yorkie. ... Mary has a Yorkie. They arouse some comment when they walk them in the park. Generics ... – PowerPoint PPT presentation

Number of Views:238
Avg rating:3.0/5.0
Slides: 80
Provided by: davidgos
Category:
Tags: discourse | yorkie

less

Transcript and Presenter's Notes

Title: Discourse


1
Discourse
2
Today
  • Homework 3
  • Final Project
  • Word Sense Disambiguation
  • Improving queries
  • Information retrieval tasks
  • Discourse
  • Pronoun Resolution
  • Discourse Cohesion

3
Homework 3 (HW 3)
  • Instead of POS tagging, your task is to do word
    sense disambiguation.
  • 1. Using
  • The following training set from the Brown corpus
    (/home/classes/guinnc/475/hw3/trainingset.txt)
    and
  • The definition of run given on WordNet
    (http//wordnet.princeton.edu/perl/webwn)
  • 2. Determine the preferred meaning of each use of
    the word run or runs in the trainingset.txt
  • 214 uses
  • Due Wednesday, March 14
  • 3. Using any algorithm or approach described in
    our book, write code that will have the computer
    automatically determine the appropriate meaning
    sense of the word run or runs within the
    context of a sentence.
  • Apply this algorithm to the training set and see
    how well you do
  • Due Friday, March 23
  • 4. A new test corpus with previously unseen data
    will be provided to you.
  • Apply your algorithm (unaltered) to this test set
    and see how it does
  • Make changes that might improve your algorithm on
    this test set
  • Due Wednesday, March 28
  • Teams
  • Chris Tripp/Andrew Martin, Tom Starr/Jerry
    Martin, Matt Singletary/Matt Ratliff, Bill
    Shipman/Andrew Cotton, Dan Reeves/Ross
    Cranford, Allen Rawls/Ralph Harris, Bret
    Mohler/Rose Rahiminejad/Jason Forsythe

4
Final Project
  • Software Project Written Project Description
  • Due Date 04/30. Everything!
  • Stages
  • Project Description 03/28
  • Background Reading 04/09
  • Progress Report 04/18
  • Final Submission 04/30

5
Project Ideas
  • Part-of-speech tagger
  • Noun phrase identifier
  • Word sense disambiguation
  • Text categorization
  • Pronoun anaphora resolution
  • Robust question/answer system
  • Translation
  • CYC (common sense KB and NLP system, ask Bill
    Shipman all about it)
  • NLTK (Natural Language Toolkit)
    http//nltk.sourceforge.net/
  • Other?

6
Project Description (03/28)
  • What you are doing
  • The specific methods you will use
  • How you will test your system
  • Treat this as a formal contract. Dont just
    scribble something off.
  • You and I can iterate but do so before 03/28.

7
Background Reading (04/09)
  • Even if you dont implement the techniques you
    read about, you should become an expert in the
    area you are working in
  • 5 Journal or Conference articles as close to your
    research topic and methods as possible
  • You will submit the citation for each journal or
    conference article and your own original review
    of the article.
  • In particular, summarize the thesis, technique(s)
    employed, and evaluation.
  • What techniques and ideas are particularly
    relevant to your project?

8
Progress Report (04/18)
  • A contractual listing of
  • Items accomplished (and dates of completion)
  • Items left to be done (and expected dates of
    completion)
  • Difficulties encountered
  • Expected solution or work-around to the
    difficulty

9
Final Submission (04/30)
  • Submit
  • Actual Code and instructions for how to compile
    and run it
  • A document describing your project, the
    techniques you used, your evaluation and results.

10
When last we left .
  • Information Retrieval
  • How do search engines do it?
  • How can they be made better?

11
Evaluating IR Performance
  • Precision relevant docs returned/total docs
    returned -- how often are you right when you say
    this document is relevant?
  • Recall relevant docs returned/relevant docs in
    collection -- how many of the relevant documents
    do you find?
  • F-measure combines P and R
  • Are P and R equally important?

12
Improving Queries
  • Relevance feedback users rate retrieved docs
  • Query expansion many techniques
  • add top N docs retrieved to query and resubmit
    expanded query
  • WordNet
  • Term clustering cluster rows of terms in
    term-by-document matrix to produce synonyms and
    add to query

13
IR Tasks
  • Ad hoc retrieval normal IR
  • Routing/categorization assign new doc to one of
    predefined set of categories
  • Clustering divide a collection into N clusters
  • Segmentation segment text into coherent chunks
  • Summarization compress a text by extracting
    summary items or eliminating less relevant items
  • Question-answering find a span of text (within
    some window) containing the answer to a question

14
Information Extraction
  • Another robust alternative
  • Idea extract particular types of information
    from arbitrary text or transcribed speech
  • Examples
  • Named entities people, places, organizations,
    times, dates
  • MIPS Vice President
    John Hime
  • MUC evaluations
  • Domains Medical texts, broadcast news (terrorist
    reports),

15
Reference Resolution Example
  • Gracie Oh yeah ... and then Mr. and Mrs. Jones
    were having matrimonial trouble, and my brother
    was hired to watch Mrs. Jones.
  • George Well, I imagine she was a very
    attractive woman.
  • Gracie She was, and my brother watched her day
    and night for six months.
  • George Well, what happened?
  • Gracie She finally got a divorce.
  • George Mrs. Jones?
  • Gracie No, my brother's wife.

16
Some Terminology
  • Discourse anything longer than a single
    utterance or sentence
  • Monologue
  • Dialogue
  • May be multi-party
  • May be human-machine

17
Reference Resolution
  • Process of associating Bloomberg/he/his with
    particular person and big budget problem/it with
    a concept
  • Guiliani left Bloomberg to be mayor of a city
    with a big budget problem. Its unclear how
    hell be able to handle it during his term.
  • Referring expressions Guilani, Bloomberg, he,
    it, his
  • Referents the person named Bloomberg, the
    concept of a big budget problem

18
  • Co-referring referring expressions Bloomberg,
    he, his
  • Antecedent Bloomberg
  • Anaphors he, his

19
Discourse Model
  • Needed because referring expressions (e.g.
    Guiliani, Bloomberg, he, it budget problem)
    encode information about beliefs about the
    referent
  • When a referent is first mentioned in a
    discourse, a representation is evoked in the
    model
  • Information predicated of it is stored also in
    the model
  • On subsequent mention, it is accessed from the
    model

20
Types of Reference
  • Entities, concepts, places, propositions, events,
    ...
  • According to John, Bob bought Sue an Integra, and
    Sue bought Fred a Legend.
  • But that turned out to be a lie. (a speech act)
  • But that was false. (proposition)
  • That struck me as a funny way to describe the
    situation. (manner of description)
  • That caused Sue to become rather poor. (event)
  • That caused them both to become rather poor.
    (combination of multiple events)

21
Reference Phenomena
  • Indefinite noun phrases
  • A homeless man hit up Bloomberg for a dollar.
  • Some homeless guy hit up Bloomberg for a dollar.
  • This homeless man hit up Bloomberg for a dollar.
  • Definite noun phrases
  • The poor fellow only got a lecture.
  • Demonstratives
  • This homeless man got a lecture but that one got
    carted off to jail.

22
  • One-anaphora
  • Clinton used to have a dog called Buddy. Now
    hes got another one.

23
Pronouns
  • A large tiger escaped from the Central Park zoo
    chasing a tiny sparrow. It was recaptured by a
    brave policeman.
  • Referents of pronouns require some degree of
    salience in the discourse model (as opposed to
    definite and indefinite NPs, e.g.)
  • How do items become salient in discourse?

24
Salience via Simple Recency
  • He had dodged the press for 36 hours, but
    yesterday the Buck House Butler came out of the
    cocoon of his room at the Millennium Hotel in New
    York and shoveled some morsels the way of the
    panting press. First there was a brief, if
    obviously self-serving, statement, and then, in
    good royal tradition, a walkabout.

25
Salience via Structural Recency
  • E So you have the engine assembly finished. Now
    attach the rope. By the way, did you buy the gas
    can today?
  • A Yes.
  • E Did it cost much?
  • A No.
  • E OK, good. Have you got it attached yet?

26
Inferables
  • I almost bought an Acura Integra today, but a
    door had a dent and the engine seemed noisy.
  • Mix the flour, butter, and water. Knead the dough
    until smooth and shiny.

27
Discontinuous Sets
  • Entities evoked together but mentioned in
    different sentence or phrases
  • John has a St. Bernard and Mary has a Yorkie.
    They arouse some comment when they walk them in
    the park.
  • John has a St. Bernard. Mary has a Yorkie. They
    arouse some comment when they walk them in the
    park.

28
Generics
  • I saw two Corgis and their seven puppies today.
    They are the funniest dogs!

29
Constraints on Coreference
  • Number agreement
  • Johns parents like opera. John hates it/John
    hates them.
  • Person and case agreement
  • Nominative I, we, you, he, she, they
  • Accusative me,us,you,him,her,them
  • Genitive my,our,your,his,her,their
  • George and Edward brought bread and cheese. They
    shared them.

30
  • Gender agreement
  • John has a Porsche. He/it/she is attractive.
  • Syntactic constraints binding theory
  • John bought himself a new Volvo. (himself John)
  • John bought him a new Volvo (him not John)
  • Selectional restrictions
  • John left his plane in the hangar.
  • He had flown it from Memphis this morning.

31
Pronoun Interpretation Preferences
  • Recency
  • John bought a new boat. Bill bought a bigger
    one. Mary likes to sail it.
  • Butgrammatical role raises its ugly head
  • John went to the Acura dealership with Bill. He
    bought an Integra.
  • Bill went to the Acura dealership with John. He
    bought an Integra.
  • ?John and Bill went to the Acura dealership. He
    bought an Integra.

32
  • And so doesrepeated mention
  • John needed a car to go to his new job. He
    decided that he wanted something sporty. Bill
    went to the dealership with him. He bought a
    Miata.
  • Who bought the Miata?
  • What about grammatical role preference?
  • Parallel constructions
  • Saturday, Mary went with Sue to the farmers
    market.
  • Sally went with her to the bookstore.
  • Sunday, Mary went with Sue to the mall.
  • Sally told her she should get over her shopping
    obsession.

33
  • Verb semantics/thematic roles
  • John telephoned Bill. Hed lost the directions
    to his house.
  • John criticized Bill. Hed lost the directions
    to his house.

34
Pragmatics
  • Context-dependent meaning
  • Jeb Bush was helped by his brother and so was
    Frank Lautenberg. (Strict vs. Sloppy)
  • Mike Bloomberg bet George Pataki a baseball cap
    that he could/couldnt run the marathon in under
    3 hours.
  • Mike Bloomberg bet George Pataki a baseball cap
    that he could/couldnt be hypnotized in under 1
    minute.

35
Sum What Factors Affect Reference Resolution?
  • Lexical factors
  • Reference type Inferability, discontinuous set,
    generics, one anaphora, pronouns,
  • Discourse factors
  • Recency
  • Focus/topic structure, digression
  • Repeated mention
  • Syntactic factors
  • Agreement gender, number, person, case
  • Parallel construction
  • Grammatical role

36
  • Selectional restrictions
  • Semantic/lexical factors
  • Verb semantics, thematic role
  • Pragmatic factors

37
Anaphora resolution
  • Finding in a text all the referring expressions
    that have one and the same denotation
  • Pronominal anaphora resolution
  • Anaphora resolution between named entities
  • Full noun phrase anaphora resolution

38
Reference Resolution
  • Given these types of constraints, can we
    construct an algorithm that will apply them such
    that we can identify the correct referents of
    anaphors and other referring expressions?

39
Issues
  • Which constraints/features can/should we make use
    of?
  • How should we order them? i.e. which override
    which?
  • What should be stored in our discourse model?
    I.e., what types of information do we need to
    keep track of?
  • How to evaluate?

40
Three Algorithms
  • Lappin Leas 94 weighting via recency and
    syntactic preferences
  • Hobbs 78 syntax tree-based referential search
  • Centering (Grosz, Joshi, Weinstein, 95 and
    various) discourse-based search

41
Lappin Leass 94
  • Weights candidate antecedents by recency and
    syntactic preference (86 accuracy)
  • Two major functions to perform
  • Update the discourse model when an NP that evokes
    a new entity is found in the text, computing the
    salience of this entity for future anaphora
    resolution
  • Find most likely referent for current anaphor by
    considering possible antecedents and their
    salience values

42
Saliency Factor Weights
  • Sentence recency (in current sentence?) 100
  • Subject emphasis (is it the subject?) 80
  • Existential emphasis (existential predicate
    nominal?) 70
  • Accusative emphasis (is it the direct object?) 50
  • Indirect object/oblique complement emphasis 40
  • Non-adverbial emphasis (not in PP) 50
  • Head noun emphasis (is head noun) 80

43
  • Implicit ordering of arguments
  • Subject existential predicate nominal object
    indirect object or oblique adverbial PP
  • On the sofa, the cat was eating bonbons.
  • sofa 10080180
  • cat 100805080310
  • bonbons 100505080280
  • Update
  • Weights accumulate over time
  • Cut in half after each sentence processed
  • Salience values for subsequent referents
    accumulate for equivalence class of
    co-referential items (exceptions, e.g. multiple
    references in same sentence)

44
  • The bonbons were clearly very tasty.
  • sofa 180/290
  • cat 310/2155
  • bonbons 280/2 (100805080)450
  • Additional salience weights for grammatical role
    parallelism (35) and cataphora (-175) calculated
    when pronoun to be resolved
  • Additional constraints on gender/number
    agrmt/syntax
  • They were a gift from an unknown admirer.
  • sofa 90/245
  • cat 155/277.5
  • bonbons 450/2225 (35) 260.

45
Reference Resolution
  • Collect potential referents (up to four sentences
    back) sofa,cat,bonbons
  • Remove those that dont agree in number/gender
    with pronoun bonbons
  • Remove those that dont pass intra-sentential
    syntactic coreference constraints
  • The cat washed it. (it?cat)
  • Add applicable values for role parallelism (35)
    or cataphora (-175) to current salience value for
    each potential antecedent
  • Select referent with highest salience if tie,
    select closest referent in string

46
Text Coherence
  • Example
  • (1) John hid Bills car keys.
  • (2) He was drunk.
  • (1) John hid Bills car keys.
  • (2) He likes junk food.
  • (1) George Bush supports big business.
  • (2) Hes sure to veto House Bill 1711.
  • Hearers try to find connections between
    utterances in a discourse.
  • The possible connections between utterances can
    be specified as a set of coherence relations.

47
Coherence relations (Hobbs,1979)
  • Result S0 causes S1
  • John bought an Acura. His father went ballistic.
  • Explanation S1 causes S0.
  • John hid Bills car keys. He was drunk.
  • Parallel S0 and S1 are parallel.
  • John bought an Acura. Bill bought a BMW.
  • Elaboration S1 is an elaboration of S0.
  • John bought an Acura this weekend. He purchased
    it for 40 thousand dollars.

48
Discourse structure
  • S1 John took a train to Bills car dealership.
  • S2 He needed to buy a car.
  • S3 The company he works for now isnt near any
    public transportation.
  • S4He also wanted to talk to Bill about their
    softball leagues.


Explanation
49
Discourse structure
  • S1 John took a train to Bills car dealership.
  • S2 He needed to buy a car.
  • S3 The company he works for now isnt near any
    public transportation.
  • S4He also wanted to talk to Bill about their
    softball leagues.



Explanation
Parallel
50
Discourse structure
  • S1 John took a train to Bills car dealership.
  • S2 He needed to buy a car.
  • S3 The company he works for now isnt near any
    public transportation.
  • S4He also wanted to talk to Bill about their
    softball leagues.



Explanation

Explanation
Parallel
51
Discourse parsing
Explanation (e1)
S1 (e1)
Parallel (e2e4)
Explanation (e2)
S4 (e4)
S2(e2)
S3(e3)
52
Why compute discourse structure?
  • Natural language understanding
  • Summarization
  • Information retrieval
  • Natural language Generation
  • Reference resolution

53
Two theories on discourse structure
  • Mann and Thompsons Rhetorical structure theory
    (1988)
  • Grosz and Sidners attention, intention and
    structure of discourse (1986)

54
Rhetorical structure theory (RST)
  • Mann and Thompson (1988)
  • One theory of discourse structure, based on
    identifying relations between parts of the text
  • Defined 20 rhetorical relations
  • Presentational relations intentional
  • Subject matter relations informational
  • Nucleus central segment of text
  • Satellite more peripheral segment
  • Relation definitions and more.

55
Presentational relations
  • Those whose intended effect is to increase some
    inclination in the hearer.
  • Relations
  • Antithesis -
    Justify
  • Background - Motivation
  • Concession -
    Preparation
  • Enablement -
    Restatement
  • Evidence - Summary

56
Subject matter relations
  • Those whose intended effect is that the hearer
    recognize the relation in question.
  • Relations
  • Circumstance -
    Otherwise
  • Condition -
    Purpose
  • Elaboration -
    Solutionhood
  • Evaluation -
    Unconditional
  • Interpretation -
    Unless
  • Means -
    Volitional cause
  • Non-volitional cause -
    Volitional result
  • Non-volitional result

57
Multinuclear relations
  • Contrast
  • Joint
  • List
  • Multinuclear restatement
  • Sequence

58
Some examples
  • Explanation John went to the coffee shop. He was
    sleepy.
  • Elaboration John likes coffee. He drinks it
    every day.
  • Contrast John likes coffee. Mary hates it.

59
Discourse structure
John likes coffee
They argue a lot
contrast
cause
elaboration
Mary hates coffee.
He drinks it every day
60
A relation Evidence
  • (a) George Bush supports big business.
  • (b) Hes sure to veto House Bill 1711.
  • Relation Name Evidence
  • Constraints on Nucl H might not believe Nucl to
    a degree satisfactory to S.
  • Constraints on Sat H believes Sat or will find
    it credible
  • Constraints on NuclSat Hs comprehending Sat in
    Sat increases Hs belief of Nucl.
  • Effect Hs belief of Nucl is increased.

61
A relation Volitional-Cause
  • (a) George Bush supports big business.
  • (b) Hes sure to veto House Bill 1711.
  • Relation Name Volitional-Cause
  • Constraints on Nucl presents a volitional action
  • Constraints on Sat none.
  • Constraints on NuclSat Sat presents a situation
    that could have caused the agent of the
    volitional action in Nucl to perform the action.
  • Effect H recognizes the situation presented in
    Sat as a cause for the volitional action
    presented in Nucl.

62
Another example
  • S (a) Come home by 500. (b) Then we can go to
    the hardware store before it closes. (c) That way
    we can finish the bookshelves tonight.
  • (a)
  • (a) (b)
    (c)

motivation
motivation
(b)
(c)
condition
condition
63
Problems with RST (Moore Pollack, 1992)
  • How many rhetorical relations are there?
  • How can we use RST in dialogues?
  • How do we incorporate speaker intentions into
    RST?
  • RST does not allow for multiple relations between
    parts of a discourse informational and
    intentional levels must coexist.

64
Grosz Sidner (1986)
65
Grosz and Sidner (1986)
  • A leading theory of discourse structure
  • Three components
  • A linguistic structure
  • An intentional structure
  • An attentional state

66
Linguistic structure
  • The structure of the sequence of utterances that
    comprises a discourse.
  • Utterances form Discourse Segment (DS) and a
    discourse is made up of embedded DSs.
  • What exactly is a DS?
  • Any evidence that humans naturally recognize
    segment boundaries?
  • Do humans agree on segment boundaries?
  • How to find the boundaries automatically?

67
Intentional structure
  • Speakers in a discourse may have many intentions
    public or private.
  • Discourse purpose (DP) the intention that
    underlies engaging in a discourse.
  • Discourse segment purpose (DSP) the purpose a
    DS. How this segment contributes to achieving the
    overall DP?
  • Two relations between DSPs
  • Dominance if DSP1 contributes to DSP2, we say
    DSP2 dominates DSP1.
  • Satisfaction-precedence DSP1 must be satisfied
    before DSP2.

68
Attentional State
  • The attentional state is an abstraction of the
    participants focus of attention as their
    discourse unfolds.
  • The state is a stack of focus spaces.
  • A focus space (FS) is associated with a DS, and
    it contains DSP and objects, properties, and
    relations salient in the DS.
  • When a DS ends, its FS is popped.
  • When a DS starts, its FS is pushed onto the stack.

69
An example
DS1
  • C1 I need to travel in May.
  • A1 And, what day in May do you want
  • to travel?
  • C2 I need to be there for a meeting on 15th.
  • A2 And you are flying into what city?
  • C3 Seattle.
  • A3 And what time would you like to
  • leave Pittsburgh?
  • C4 Hmm. I dont think there are many
  • options for non-stop.
  • A4 There are three non-stops today?
  • C5 What are they?
  • .

DS2
DS0
DS3
DS4
DS5
70
Discourse structure with intention info
DS0
DS1
DS3
DS4
DS2
DS5
A1-C2
A2-C3
C1
A3
C4-C7
  • I0 C wants A to find a flight for C
  • I1 C wants A to know that C is traveling in May.
  • I2 A wants to know the departure data
  • I3 A wants to know the destination
  • I4 A wants to know the departure time
  • I5 C wants A to find a nonstop flight

71
Problems with GS 1986
  • Assume that discourses are task-oriented
  • Assume there is a single, hierarchical structure
    shared by speaker and hearer
  • Do people really build such structures when they
    speak? Do they use them in interpreting what
    others say?

72
Building discourse structure
73
Tasks
  • Identify discourse segment boundaries
  • Determine relations between segments
  • Determine intentions of the segments
  • Determine the attentional state
  • Methods
  • Inference-based approach symbolic
  • Cue-based approach statistical

74
Inference-based approach
  • Ex John hid Bills car keys. He was drunk.
  • X is drunk ? people do not want X to drive
  • People dont want X to drive ? people hide Xs
    car key.
  • Abduction

? AI-complete Require and utilize world
knowledge.
75
Cue-based approach
  • Attentional state
  • Attentional changes
  • (push) now, next, but, .
  • (pop) anyway, in any case, now back to, ok,
    fine,...
  • True interruption excuse me, I must interrupt
  • Flashback oops, I forgot
  • Intention
  • Satisfaction-precedes first, second,
    furthermore, .
  • Dominance for example, first, second, .

76
Cues (cont)
  • Linguistic structure
  • Elaboration for example,
  • Concession although
  • Condition if
  • Sequence and, first, second.
  • Contrast and,

77
One example
  • (Marcu 1999) Train a parser on a discourse
    treebank.
  • 90 trees, hand-annotated for rhetorical relations
    (RR)
  • Learn to identify Elementary discourse units
    (EDUs)
  • Learn to identify N, S, and their relation.
  • Features WordNet-based similarity, lexical,
    structural,

78
Results
  • Id EDUs 96-98 accuracy
  • Id hierarchical structures (2 EDUs are related)
    Rec71, Prec84
  • Id nucleus/satellite labels Rec58, Prec69
  • Id rhetorical relation Rec38, Prec45
  • ?Hierarchical structure is easier to id than
    rhetorical relations.

79
Next Class
  • Monday, March 19
  • Chapter 19
Write a Comment
User Comments (0)
About PowerShow.com