Computational Models of Text Quality - PowerPoint PPT Presentation

1 / 106
About This Presentation
Title:

Computational Models of Text Quality

Description:

* Discourse (coherence) relations Only recently empirically results have shown that discourse relations are predictive of text quality (Pitler and Nenkova, ... – PowerPoint PPT presentation

Number of Views:278
Avg rating:3.0/5.0
Slides: 107
Provided by: AniNe
Category:

less

Transcript and Presenter's Notes

Title: Computational Models of Text Quality


1
Computational Models of Text Quality
  • Ani Nenkova
  • University of Pennsylvania
  • ESSLLI 2010, Copenhagen

2
The ultimate text quality application
  • Imagine your favorite text editor
  • With spell-checker and grammar checker
  • But also functions that tell you
  • Word W is repeated too many times
  • Fill the gap is a cliché
  • You might consider using this more figurative
    expression
  • This sentence is unclear and hard to read
  • What is the connection between these two
    sentences?
  • ..

3
Currently
  • It is our friends who give such feedback
  • Often conflicting
  • We might agree that a text is good, but find it
    hard to explain exactly why
  • Computational linguistics should have some
    answers
  • Though far from offering a complete solution yet

4
In this course
  • We will overview research dealing with various
    aspects of text quality
  • A unified approach does not yet exist, but many
    proposals
  • have been tested on corpus data
  • integrated in applications

5
Current applications education
  • Grading student writing
  • Is this a good essay?
  • One of the graders of SAT and GRE essays is in
    fact a machine! 1
  • http//www.ets.org/research/capabilities/automated
    _scoring
  • Providing appropriate reading material
  • Is this text good for a particular user?
  • Appropriate grade level
  • Appropriate language competency in L2 2,3
  • http//reap.cs.cmu.edu/

6
Current applications information retrieval
  • Particularly user generated content
  • Questions and answers on the web
  • Blogs and comments
  • Searching over such content poses new problems
    4
  • What is a good question/answer/comment?
  • http//answers.yahoo.com/
  • Relevant for general IR as well
  • Of the many relevant document some, are better
    written

7
Current applications NLP
  • Models of text quality
  • lead to improved systems 5
  • offer possibilities for automatic evaluation 6
  • Automatic summarization
  • Select important content and organize it in as
    well-written text
  • Language generation
  • Select, organize and present content on document,
    paragraph, sentence and phrase level
  • Machine translation

8
Text quality factors
  • Interesting
  • Style (clichés, figurative language)
  • Vocabulary use
  • Grammatical and fluent sentences
  • Coherent and easy to understand
  • In most types of writing, well-written means
    clear and easy to understand. Not necessarily so
    in literary works.
  • Problems with clarity of instructions motivated a
    fair amount of early work.

9
Early work keep in mind these predate modern
computers!
  • Common words are easier to understand
  • stentorian vs. loud
  • myocardial infarction vs. heart attack
  • Common words are short
  • Standard readability metrics
  • percentage of words not among the N most frequent
  • average numbers of syllables per word
  • Syntactically simple sentences are easier to
    understand
  • average number of words per sentence
  • Flesch-Kincaid, Automated Readability Index,
    Gunning-Fog, SMOG, Coleman-Liau

10
Modern equivalents
  • Language models
  • Word probabilities from a large collection
  • http//www.speech.cs.cmu.edu/SLM_info.html
  • Features derived from syntactic parse 2,7,8,9
  • Parse tree height
  • Number of subordinating conjunctions
  • Number of passive voice constructions
  • Number of noun and verb phrases

11
Language models
  • Unigram and bigram language models
  • Really, just huge tables
  • Smoothing necessary to account for unseen words

12
Features from language models
  • Assessing the readability of text t consisting of
    m words, for intended audience class c
  • Number of out of vocabulary words in the text
    with respect to the language model for c
  • Text likelihood and perplexity

13
Application to grade level predictionCollins-Thom
pson and Callan, NAACL 2004 10
14
Application to grade level predictionCollins-Thom
pson and Callan, NAACL 2004 10
15
Results on predicting grade levelSchwarm and
Ostendorf, ACL 2005 11
  • Flesch-Kincaid Grade Level index
  • number of syllables per word
  • sentence length
  • Lexile
  • word frequency
  • sentence length
  • SVM features
  • language models and syntax

16
Models of text coherence
  • Global coherence
  • Overall document organization
  • Local coherence
  • Adjacent sentences

17
Text structure can be learnt in an unsupervised
manner
Location, time
  • Human-written examples from a domain

damage
magnitude
relief efforts
18
Content model Barzilay Lee04 12
  • Hidden Markov Model (HMM)-based
  • States - clusters of related sentences topics
  • Transition prob. - sentence precedence in corpus
  • Emission prob. - bigram language model

Generating sentence in current topic
Earthquake reports
Transition from previous topic
location, magnitude
relief efforts
casualties
19
Generating Wikipedia articlesSauper and
Barzilay, 2009 12
  • Articles on diseases and American film actors
  • Create templates of subtopics
  • Focus only on subtopic level structure
  • Use paragraphs from documents on the web

20
Template creation
  • Cluster similar headings
  • signs and symptoms, symptoms, early symptoms
  • Choose k clusters
  • average number of subtopics in that domain
  • Find majority ordering for the clusters

Biography Early life Career Personal life Death
Diseases Symptoms Causes Diagnosis Treatment
21
Extraction of excerpts and ranking
  • Candidates for a subtopic
  • Paragraphs from top 10 pages of search results
  • Measure relevance of candidates for that subtopic
  • Features unigrams, bigrams, number of
    sentences

22
Need to control redundancy across subtopics
  • Integer Linear Program
  • Variables
  • One per excerpt (value 1-chosen or 0)
  • Objective
  • Minimize sum of the ranks of the excerpts chosen

1 2 3 4 5
causes
symptoms
diagnosis
treatment
  • Constraints
  • Cosine similarity between any selected pair lt
    0.5
  • One excerpt per subtopic

23
Linguistic models of coherenceHalliday and
Hasan, 1976 13
  • Coherent text is characterized by the presence of
    various types of cohesive links that facilitate
    text comprehension
  • Reference and lexical reiteration
  • Pronouns, definite descriptions, semantically
    related words
  • Discourse relations (conjunction)
  • I closed the window because it started raining.
  • Substitution (one) or ellipses (do)

24
Referential coherence
  • Centering theory
  • tracking focus of attention across adjacent
    sentences 14, 15, 16, 17
  • Syntactic form of references
  • Particularly first and subsequent mention 18,
    19, pronominalization
  • Lexical chains
  • Identifying and tracking topics within a text
    20, 21, 22, 23

25
Discourse relations
  • Explicit vs. implicit
  • I stayed home because I had a headache
  • Signaled by a discourse connective
  • Inferred without the presence of a connective
  • I took my umbrella. Because The forecast was
    for rain in the afternoon.

26
Lexical chains
  • Often discussed as cohesion indicator,
    implemented systems, but not used in text quality
    tasks
  • Find all words that refer to the same topic
  • Find the correct sense of the words
  • LexChainer Tool http//www1.cs.columbia.edu/nlp/t
    ools.cgi 23
  • Applications summarization, IR, spell checking,
    hypertext construction
  • John bought a Jaguar. He loves the car.
  • LC jaguar, car, engine, it

27
Centering theory ingredients(Grosz et al, 1995)
  • Deals with local coherence
  • What happens to the flow from sentence to
    sentence
  • Does not deal with global structuring of the text
    (paragraphs/segments)
  • Defines coherence as an estimate of the
    processing load required to understand the text

28
Processing load
  • Upon hearing a sentence a person
  • Cognitive effort to interpret the expressions in
    the utterance
  • Integrates the meaning of the utterance with that
    of the previous sentence
  • Creates some expectations on what might come next

29
Example
  • John met his friend Mary today.
  • He was surprised to see her.
  • He thought she is still in Italy.
  • Form of referring expressions
  • Anaphora needs to be resolved
  • Create a discourse entity at first mention with
    full noun phrase
  • Creating expectations

30
Creating and meeting expectations
  • (1) a. John went to his favorite music store to
    buy a piano.
  • b. He had frequented the store for many
    years.
  • c. He was excited that he could finally buy
    a piano.
  • d. He arrived just as the store was closing
    for the day.
  • (2) a. John went to his favorite music store to
    buy a piano.
  • b. It was a store John had frequented for
    many years.
  • c. He was excited that he could finally buy
    a piano.
  • d. It was closing just as John arrived.

31
Interpreting pronouns
  1. Terry really goofs sometimes.
  2. Yesterday was a beautiful day and he was excited
    about trying out his new sailboat.
  3. He wanted Tony to join him on a sailing
    expedition.
  4. He called him at 6am.
  5. He was sick and furious at being woken up so
    early.

32
Basic centering definitions
  • Centers of an utterance
  • Set of entities serving to link that utterance to
    the other utterances in the discourse segment
    that contains it
  • Not words or phrases themselves
  • Semantic interpretations of noun phraes

33
Types of centers
  • Forward looking centers
  • An ordered set of entities
  • What could we expect to hear about next
  • Ordered by salience as determined by grammatical
    function
  • Subject gt Indirect object gt Object gt Others
  • John gave the textbook to Mary.
  • Cf John, Mary, textbook
  • Preferred center Cp
  • The highest ranked forward looking center
  • High expectation that the next utterance in the
    segment will be about Cp

34
Backward looking center
  • Single backward looking center, Cb (U)
  • For each utterance other than the segment-initial
    one
  • The backward looking center of utterance Un1
    connects with one of the forward looking centers
    of Un
  • Cb (U1) is the most highly ranked element from
    Cf (Un) that is also realized in U1

35
Centering transitions ordering
Cb(Un1)Cb(Un) OR Cb(Un)? Cb(Un1) ! Cb(Un)
Cb(Un1) Cp(Un1) continue smooth-shift
Cb(Un1) ! Cp(Un1) retain rough-shift
36
Centering constraints
  • There is precisely one backward-looking center
    Cb(Un)
  • Cb(Un1) is the highest-ranked element of Cf(Un)
    that is realized in Un1

37
Centering rules
  • If some element of Cf(Un) is realized as a
    pronoun in Un1 then so is Cb(Un1)
  • Transitions not equal
  • continue gt retain gt smooth-shift gt rough-shift

38
Centering analysis
  • Terry really goofs sometimes.
  • CfTerry, Cb?, undef
  • Yesterday was a beautiful day and he was excited
    about trying out his new sailboat.
  • CfTerry,sailboat, CbTerry, continue
  • He wanted Tony to join him in a sailing
    expedition.
  • CfTerry, Tony, expedition, CbTerry, continue
  • He called him at 6am.
  • CfTerry,Tony, CbTerry, continue

39
  • He called him at 6am.
  • CfTerry,Tony, CbTerry, continue
  • Tony was sick and furious at being woken up so
    early.
  • CfTony, CbTony, smooth shift
  • He told Terry to get lost and hung up.
  • CfTony,Terry, CbTony, continue
  • Of course, Terry hadnt intended to upset Tony.
  • CfTerry,Tony, Cb Tony, retain

40
Rough shifts in evaluation of writing skills
(Miltsakaki and Kukich, 2002)
  • Automatic grading of essays by E-rater
  • Syntactic variety
  • Represented by features that quantify the
    occurrence of clause types
  • Clear transitions
  • Cue phrases in certain syntactic constructions
  • Existence of main and supporting points
  • Appropriateness of the vocabulary content of the
    essay
  • What about local coherence?

41
Essay score model
  • Human score available
  • E-rater prediction available
  • Percentage of rough-shifts in each essay
    analysis done manually
  • Negative correlation between the human score and
    the percentage of rough-shifts

42
  • Linear multi-factor regression
  • Approximate the human score as a linear function
    of the e-rater prediction and the percentage of
    rough-shifts
  • Adding rough shifts significantly improves the
    model of the score
  • 0.5 improvement on 16 scale
  • How easy/difficult would it be to fully automate
    the rough-shift variable?

43
Variants of centering and application to
information ordering
  • Karamanis et al, 09 is the most comprehensive
    overview of variants of centering theory and an
    evaluation of centering in a specific task
    related to text quality

44
Information ordering task
  • Given a set of sentences/clauses, what is the
    best presentation?
  • Take a newspaper article and jumble the
    sentences---the result will be much more
    difficult to read than the original
  • Negative examples constructed by randomly
    permuting the original
  • Criteria for deciding which of two orderings is
    better
  • Centering would definitely be applicable

45
Centering variations
  • Continuity (NOCBlack of continuity)
  • Cf(Un) and Cf(Un1) share at least one element
  • Coherence
  • Cb(Un) Cb(Un1)
  • Salience
  • Cb(U) Cp(U)
  • Cheapness (fulfilled expectations)
  • Cb (Un1) Cp(Un)

46
Metrics of coherence
  • M.NOCB (no continuity)
  • M.CHEAP (expectations not met)
  • M.KP sum of the violations of continuity,
    cheapness, coherence and salience
  • M. BFP seeks to maximize transitions according to
    Rule 2

47
Experimental methodology
  • Gold-standard ordering
  • The original order of the text (object
    description, news article)
  • Assume that other orderings are inferior
  • Classification error rate
  • Percentage orderings that score better than the
    gold-standard 0.5percentage of the orderings
    that score the same

48
Results
  • NOCB gives best results
  • Significantly better than the other metrics
  • Consistent results for three different corpora
  • Museum artifact descriptions (2)
  • News
  • Airplane accidents
  • M.BFP is the second best metric

49
(No Transcript)
50
Entity grid(Barzilay and Lapata, 2005, 2008)
  • Inspired by centering
  • Tracks entities across adjacent sentences, as
    well as their syntactic positions
  • Much easier to compute from raw text
  • Brown Coherence Toolkit
  • http//www.cs.brown.edu/melsner/manual.html

51
Entity grid applications
  • Several applications , with very good results
  • Information ordering
  • Comparing the coherence of pairs of summaries
  • Distinguishing readability levels
  • Child vs. adult
  • Improves over PetersenOstendorf

52
Entity grid example
  • 1 The Justice DepartmentS is conducting an
    anti-trust trialO against Microsoft Corp.X
    with evidenceX that the companyS is
    increasingly attempting to crush competitorsO.
  • 2 MicrosoftO is accused of trying to forcefully
    buy into marketsX where its own productsS are
    not competitive enough to unseat established
    brandsO.
  • 3 The caseS revolves around evidenceO of
    MicrosoftS aggressively pressuring NetscapeO
    into merging browser softwareO.
  • 4 MicrosoftS claims its tacticsS are
    commonplace and good economically.
  • 5 The governmentS may file a civil suitO
    ruling that conspiracyS to curb competitionO
    through collusionX is a violation of the
    Sherman ActO.
  • 6 MicrosoftS continues to show increased
    earningsO despite the trialX.

53
Entity grid representation
54
16 entity grid features
  • The probability of each type of transition in the
    text
  • Four syntactic distinctions
  • S, O, X, _

55
Type of reference and info ordering(Elsner and
Charniak, 2008)
  • Entity grid features not concerned with how an
    entity is mentioned
  • Discourse old vs. discourse new
  • Kent Wells, a BP senior vice president said on
    Saturday during a technical briefing that the
    current cap, which has a looser fit and has been
    diverting about 15,000 barrels of oil a day to a
    drillship, will be replaced with a new one in 4
    to 7 days.
  • The new cap will take 4 to 7 days to be
    installed, and in case the new cap is not
    effective, Mr. Wells said engineers were prepared
    to replace it with an improved version of the
    current cap.

56
  • The probability of a given sequence of discourse
    new and old realizations gives a further
    indication about ordering
  • Similarly, pronouns should have reasonable
    antecedents
  • Adding both models to the entity grid improves
    performance on the information ordering task

57
Sentence Ordering
  • n sentences
  • Output from a generation or summarization system
  • Find most coherent ordering
  • n! permutations
  • With local coherence metrics
  • Adjacent sentence flow
  • Finding best ordering is NP complete
  • Reduction from Traveling Salesman Problem

58
Word co-occurrence model(Lapata, ACL 2003
Soricut and Marcu, 2005) 23,24
  • Idea from statistical machine translation
  • Alignment models

John went to a restaurant. He ordered fish. The
waiter was very attentive.
John est allé à un restaurant.Il ordonna de
poisson.Le garçon était très attentif.
John went to a restaurant. He ordered fish. The
waiter was very attentive.
He ordered fish.The waiter was very
attentive.John gave him a huge tip.
P(ordered restaurant)
P(fish poisson)
We ate at a restaurant yesterday.
P(waiter ordered)
We also ordered some take away.
P(tip waiter)

59
Discourse (coherence) relations
  • Only recently empirically results have shown that
    discourse relations are predictive of text
    quality (Pitler and Nenkova, 2008)

60
PDTB discourse relations annotations
  • Largest corpus of annotated discourse relations
  • http//www.seas.upenn.edu/pdtb/
  • Four broad classes of relations
  • Contingency
  • Comparison
  • Temporal
  • Expansion
  • Explicit and implicit

61
Implicit and explicit relations
  • (E1) He is very tired because he played tennis
    all morning.
  • (E2) He is not very strong but he can run
    amazingly fast.
  • (E3) We had some tea in the afternoon and later
    went to a restaurant for a big dinner
  • (I1) I took my umbrella this morning. because
    The forecast was for rain.
  • (I2) She is never late for meetings. but He
    always arrives 10 minutes late.
  • (I3) She woke up early. afterwards She had
    breakfast and went for a walk in the park.

62
What is the relative importance of factors in
determining text quality?
  • Competent readers (native English speaker)
  • graduate students at Penn
  • Wall Street Journal texts
  • 30 texts ranked on scale 1 to 5
  • How well-written is this article?
  • How well does the text fit together?
  • How easy was it to understand?
  • How interesting is the article?

63
  • Several judgments for each text
  • Final quality score was the average
  • Scores range from 1.5 to 4.33
  • Mean 3.2

64
  • Which of the many indicators will work best?
  • Usually research study focus on only one or two
  • How do indicators combine?
  • Metrics
  • Correlation coefficient
  • Accuracy of pair-wise ranking prediction

65
  • Correlation coefficients between assessor ratings
    and different features

66
Baseline measures
  • Average Characters/Word
  • r -.0859 (p .6519)
  • Average Words/Sentence
  • r .1637 (p .3874)
  • Max Words/Sentence
  • r .0866 (p .6489)
  • Article length
  • r -.3713 (p .0434)

67
Vocabulary factors
  • Language model probability of the article
  • M estimated from PTB (WSJ)
  • M estimated from general news (NEWS)

68
Correlations with well-written assessment
  • Log likelihood, WSJ
  • r .3723 (p .0428)
  • Log likelihood, NEWS
  • r .4497 (p .0127)
  • Log likelihood with length, WSJ
  • r .3732 (p .0422)
  • Log likelihood with length, NEWS
  • r .6359, p .0002

69
Syntactic features
  • Average parse tree height
  • r -.0634 (p .7439)
  • Avr. number of noun phrases per sentence
  • r .2189 (p .2539)
  • Average SBARs
  • r .3405 (p .0707)
  • Avr. number of verb phrases per sentence
  • r .4213 (p .0228)

70
Elements of lexical cohesion
  • Avr. cosine similarity between adjacent sents
  • r -.1012 (p .5947)
  • Avr. word overlap between adjacent sentences
  • r -.0531, p .7806
  • Avr. NounPronoun Overlap
  • r .0905, p .6345
  • Avr. Pronouns/Sent
  • r .2381, p .2051
  • Avr Definite Articles
  • r .2309, p .2196

71
Correlation with well-written score
  • Prob. of S-S transition
  • r -.1287 (p .5059)
  • Prob. of S-O transition
  • r -.0427 (p .8261)
  • Prob. of S-X transition
  • r -.1450 (p .4529)
  • Prob. of S-N transition
  • r .3116 (p .0999)
  • Prob. of O-S transition
  • r .1131 (p .5591)
  • Prob. of O-O transition
  • r .0825 (p .6706)
  • Prob. of O-X transition
  • r .0744 (p .7014)
  • Prob. of O-N transition
  • r .2590 (p .1749)

72
  • Prob. of X-S transition
  • r .1732 (p .3688)
  • Prob. of X-O transition
  • r .0098 (p .9598)
  • Prob. of X-X transition
  • r -.0655 (p .7357)
  • Prob. of X-N transition
  • r .1319 (p .4953)
  • Prob. of N-S transition
  • r .1898 (p .3242)
  • Prob. of N-O transition
  • r .2577 (p .1772)
  • Prob. of N-X transition
  • r .1854 (p .3355)
  • Prob. of N-N transition
  • r -.2349 (p .2200)

73
Well-writteness and discourse
  • Log likelihood of discourse rels
  • r .4835 (p .0068)
  • of discourse relations
  • r -.2729 (p .1445)
  • Log likelihood of rels with of rels
  • r .5409 (p .0020)
  • of relations with of words
  • r .3819 (p .0373)
  • Explicit relations only
  • r .1528 (p .4203)
  • Implicit relations only
  • r .2403 (p .2009)

74
Summary significant factors
  • Log likelihood of discourse relations
  • r .4835
  • Log likelihood , NEWS
  • r .4497
  • Average verb phrases per sentence
  • r .4213
  • Log likelihood, WSJ
  • r .3723
  • Number of words
  • r -.3713

75
Text quality prediction as ranking
  • Every pair of texts with ratings differing by 0.5
  • Features are the difference of feature values for
    each text
  • Task predict which of the two articles has
    higher text quality score

76
Prediction accuracy (10-fold cross validation)
  • None (Majority Class) 50.21
  • number of words 65.84
  • ALL 88.88
  • Grid only 79.42
  • log l discourse rels 77.77
  • Avg VPs sen 69.54
  • log l NEWS 66.25

77
Findings
  • Complex interplay between features
  • Entity grid features not significantly correlated
    with well-written score but very useful for the
    ranking task
  • Discourse information is very helpful
  • But here we used gold-standard annotations
  • Developing automatic classifier underway

78
Implicit and explicit discourse relations
Class Explicit Implicit
Comparison 69 31
Contingency 47 53
Temporal 80 20
Expansion 42 58
79
Sense classification based on connectives only
  • Four-way classification
  • Explicit relations only
  • 93 accuracy
  • All relations (implicitexplicit)
  • 75 accuracy
  • Implicit relations are the real challenge

80
Explicit discourse relations, tasksPitler and
Nenkova, 2009 25
  • Discourse vs. non-discourse use
  • I will be happier once the semester is over.
  • I have been to Ohio once.
  • Relation sense
  • Contingency, comparison, temporal, expansion
  • I havent been to Paris since I went there on a
    school trip in 1998. Temporal
  • I havent been to Antarctica since it is very far
    away. Contingency

81
Penn Discourse Treebank
  • Largest available annotated corpus of discourse
    relations
  • Penn Treebank WSJ articles
  • 18,459 explicit discourse relations
  • 100 connectives
  • although vs. or
  • 91 discourse 3 discourse

82
Discourse Usage Experiments
  • Positive examples discourse connectives
  • Negative examples same strings in PTDB,
    unannotated
  • 10-fold cross validation
  • Maximum Entropy classifier

83
Discourse Usage Results
84
Discourse Usage Results
85
Sense Disambiguation Comparison, Contingency,
Expansion, or Temporal?
Features Accuracy
Connective 93.67
Connective Syntax 94.15
Interannotator Agreement 94
86
Tool
  • Automatic annotation of discourse use and sense
    of discourse connectives
  • Discourse Connectives Tagger
  • http//www.cis.upenn.edu/epitler/discourse.html

87
What about implicit relations?
  • Is there hope to have a usable tool soon?
  • Early studies on unannotated data gave reason for
    optimism
  • But when recently tested on the PDTB, their
    performance is poor
  • Accuracy of contingency, comparison and temporal
    is below 50

88
Classify implicits and explicits together
  • Not easy to infer from combined results how early
    systems performed on implicits
  • As we saw, one can get reasonable overall
    performance by doing nothing for explicts
  • Same sentence 26
  • Graphbank corpus doesnt distinguish implicit
    and explicit 27

89
Classify on large unannotated corpus
  • Create artificial implicits by deleting
    connective 28, 29, 30
  • I am in Europe, but I live in the United States.
  • First proposed by Marcu and Echihabi, 2002
  • Very good initial results
  • Accuracy of distinguishing between two rels, gt75
  • But these were on balanced classes
  • Not the case in real text
  • Not tested on real implicits (but see 30,29)

90
Experiments with PDTB
  • Pitler et al, ACL 2009 31
  • Wide variety of features to capture semantic
    opposition and parallelism
  • Lin et al, EMNLP 2009 32
  • (Lexicalized) syntactic features
  • Results improve over baselines, better
    understanding of features, but the classifiers
    are not suitable for application in real tasks

91
Word pairs as features
  • Most basic feature for implicits
  • I_there, I_is, , tired_time, tired_difference

I
am
a
Iittle
tired
there
is
a
13
hour
time
difference
Marcu and Echihabi , 2002
92
Intuition with large amounts of data, will find
semantically-related pairs
  • The recent explosion of country funds mirrors the
    closed-end fund mania of the 1920s, Mr. Foot
    says, when narrowly focused funds grew wildly
    popular.
  • They fell into oblivion after the 1929 crash.

93
Meta error analysis of prior work
  • Using just content words reduces performance (but
    has steeper learning curve)
  • Marcu and Echihabi, 2002
  • Nouns and adjectives dont help at all
  • Lapata and Lascarides, 2004 33
  • Filtering out stopwords lowers results
  • Blair-Goldensohn et al., 2007

94
Word pairs experimentsPitler et al 2009
  • Synthetic implicits Cause/Contrast/None
  • Explicit instances from Gigaword with connective
    deleted
  • Because ? Cause, But ? Contrast
  • At least 3 sentences apart ? None
  • Blair-Goldensohn et al., 2007
  • Random selection
  • 5,000 Cause
  • 5,000 Other
  • Computed information gain of word pairs

95
Function words have highest information gain
ButDidnt we remove the connective?
96
but signals Not-Comparison in synthetic data
  • The government says it has reached most isolated
    townships by now, but because roads are blocked,
    getting anything but basic food supplies to
    people remains difficult.
  • but because ? Comparison
  • but because ? Contingency

97
Results Word pairs
  • Pairs of words from the two text spans
  • What doesnt work
  • Training on synthetic implicits
  • What really works
  • Use synthetic implicits for feature selection
  • Train on PDTB

98
Best Results f-scores
Comparison 21.96 (17.13) Contingency 47.13 (31.10)
Expansion 76.41 (63.84) Temporal 16.76 (16.21)
Comparison/Contingency baseline synthetic
implicits word pairs Expansion/Temporal baseline
real implicits word pairs
99
Further experiments using context
  • Results from classifying each relation
    independently
  • Naïve Bayes, MaxEnt, AdaBoost
  • Since context features were helpful, tried CRF
  • 6-way classification, word pairs as features
  • Naïve Bayes accuracy 43.27
  • CRF accuracy 44.58

100
Do we need more coherence factors?Louis and
Nenkova, 2010 34
  • If we had perfect co-reference and discourse
    relation information, would we be able to explain
    local discourse coherence
  • Our recent corpus study indicates the answer is
    NO
  • 30 of adjacent sentences in the same paragraph
    in PDTB
  • Neither share an entity nor have an implicit
    comparison contingency or temporal relation
  • Lexical chains?

101
References
  • 1 Burstein, J. Chodorow, M. (in press).
    Progress and new directions in technology for
    automated essay evaluation. In R. Kaplan (Ed.),
    The Oxford handbook of applied linguistics (2nd
    Ed.). New York Oxford University Press.
  • 2 Heilman, M., Collins-Thompson, K., Callan,
    J., and Eskenazi, M. (2007). Combining Lexical
    and Grammatical Features to Improve Readability
    Measures for First and Second Language Texts.
    Proceedings of the Human Language Technology
    Conference. Rochester, NY.
  • 3 S. Petersen and M. Ostendorf, A machine
    learning approach to reading level assessment,
    Computer, Speech and Language, vol. 23, no. 1,
    pp. 89-106, 2009
  • 4 Finding High Quality Content in Social Media,
    Eugene Agichtein, Carlos Castillo, Debora Donato,
    Aristides Gionis, Gilad Mishne, ACM Web Search
    and Data Mining Conference (WSDM), 2008
  • 5 Regina Barzilay and Lillian Lee, Catching the
    Drift Probabilistic Content Models, with
    Applications to Generation and Summarization,
    HLT-NAACL 2004 Proceedings of the Main
    Conference, pp113120, 2004

102
References
  • 6 Emily Pitler, Annie Louis and Ani Nenkova,
    Automatic Evaluation of Linguistic Quality in
    Multi-Document Summarization, Proceedings of ACL
    2010
  • 7 Schwarm, S. E. and Ostendorf, M. 2005.
    Reading level assessment using support vector
    machines and statistical language models. In
    Proceedings of ACL 2005.
  • 8 Jieun Chae, Ani Nenkova Predicting the
    Fluency of Text with Shallow Structural Features
    Case Studies of Machine Translation and
    Human-Written Text. In proceedings of EACL 2009
    139-147
  • 9 Charniak, E. and Johnson, M. 2005.
    Coarse-to-fine n-best parsing and MaxEnt
    discriminative reranking. In Proceedings of ACL
    2005.
  • 10 K. Collins-Thompson and J. Callan. (2004). A
    language modeling approach to predicting reading
    difficulty. Proceedings of HLT/NAACL 2004.
  • 11 Sarah E. Schwarm and Mari Ostendorf. Reading
    Level Assessment Using Support Vector Machines
    and Statistical Language Models. In Proceedings
    of ACL, 2005.

103
References
  • 12 Automatically generating Wikipedia articles
    A structure-aware approach, C. Sauper and R.
    Barzilay, ACL-IJCNLP 2009
  • 13 Halliday, M. A. K., and Ruqaiya Hasan.
    1976.Cohesion in English. London Longman
  • 14 B. Grosz, A. Joshi, and S. Weinstein. 1995.
    Centering a framework for modelling the local
    coherence of dis- course. Computational
    Linguistics, 21(2)203226
  • 15 E. Miltsakaki and K. Kukich. 2000. The role
    of centering theorys rough-shift in the teaching
    and evaluation of writing skills. In Proceedings
    of ACL00, pages 408 415.
  • 16 Karamanis, N., Mellish, C., Poesio, M., and
    Oberlander, J. 2009. Evaluating centering for
    information ordering using corpora. Comput.
    Linguist. 35, 1 (Mar. 2009), 29-46.
  • 17 Regina Barzilay, Mirella Lapata, "Modeling
    Local Coherence An Entity-based Approach,
    Computational Linguistics, 2008.
  • 18 Ani Nenkova, Kathleen McKeown References to
    Named Entities a Corpus Study. HLT-NAACL 2003

104
References
  • 19 Micha Elsner, Eugene Charniak
    Coreference-inspired Coherence Modeling. ACL
    (Short Papers) 2008 41-44
  • 20 Morris, J. and Hirst, G. 1991. Lexical
    cohesion computed by thesaural relations as an
    indicator of the structure of text. Comput.
    Linguist. 17, 1 (Mar. 1991), 21-48.
  • 21 Regina Barzilay and Michael Elhadad, "Text
    summarizations with lexical chains, In Inderjeet
    Mani and Mark Maybury, editors, Advances in
    Automatic Text Summarization. MIT Press, 1999.
  • 22 Silber, H. G. and McCoy, K. F. 2002.
    Efficiently computed lexical chains as an
    intermediate representation for automatic text
    summarization. Comput. Linguist. 28, 4 (Dec.
    2002), 487-496.
  • 23 Mirella Lapata, Probabilistic Text
    Structuring Experiments with Sentence Ordering,
    Proceedings of ACL 2003.
  • 24 Discourse generation using utility-trained
    coherence models, R. Soricut D. Marcu,
    COLING-ACL 2006

105
References
  • 25 Emily Pitler and Ani Nenkova. Using Syntax
    to Disambiguate Explicit Discourse Connectives in
    Text. Proceedings of ACL, short paper, 2009
  • 26 Radu Soricut and Daniel Marcu. 2003.
    Sentence Level Discourse Parsing using Syntactic
    and Lexical Information. Proceedings of the Human
    Language Technology and North American
    Association for Computational Linguistics
    Conference (HLT/NAACL-2003)
  • 27 Ben Wellner, James Pustejovsky, Catherine
    Havasi, Roser Sauri and Anna Rumshisky.
    Classification of Discourse Coherence Relations
    An Exploratory Study using Multiple Knowledge
    Sources. In Proceedings of the 7th SIGDIAL
    Workshop on Discourse and Dialogue
  • 28 Daniel Marcu and Abdessamad Echihabi (2002).
    An Unsupervised Approach to Recognizing Discourse
    Relations. Proceedings of the 40th Annual Meeting
    of the Association for Computational Linguistics
    (ACL-2002)
  • 29 Sasha Blair-Goldensohn, Kathleen McKeown,
    Owen Rambow Building and Refining
    Rhetorical-Semantic Relation Models. HLT-NAACL
    2007 428-435

106
References
  • 30 Sporleder, C. and Lascarides, A. 2008. Using
    automatically labelled examples to classify
    rhetorical relations An assessment. Nat. Lang.
    Eng. 14, 3 (Jul. 2008), 369-416.
  • 31 Emily Pitler, Annie Louis, and Ani Nenkova.
    Automatic Sense Prediction for Implicit Discourse
    Relations in Text. Proceedings of ACL, 2009.
  • 32 Ziheng Lin, Min-Yen Kan and Hwee Tou Ng
    (2009). Recognizing Implicit Discourse Relations
    in the Penn Discourse Treebank. In Proceedings of
    EMNLP
  • 33 Lapata, Mirella and Alex Lascarides. 2004.
    Inferring Sentence-internal Temporal Relations.
    In Proceedings of the North American Chapter of
    the Assocation of Computational Linguistics,
    153-160.
  • 34 Annie Louis and Ani Nenkova, Creating Local
    Coherence An Empirical Assessment, ?Proceedings
    of NAACL-HLT 2010
Write a Comment
User Comments (0)
About PowerShow.com