Finding and Using Rhetorical-Semantic Relations in Text

About This Presentation

Title:

Finding and Using Rhetorical-Semantic Relations in Text

Description:

Finding and Using Rhetorical-Semantic Relations in Text Sasha Blair-Goldensohn 28 April 2005 Outline Background Relations and Definitional QA Exploring Statistical ... – PowerPoint PPT presentation

Number of Views:217

Avg rating:3.0/5.0

Slides: 32

Provided by: SashaBlai

Learn more at: http://www.cs.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: Finding and Using Rhetorical-Semantic Relations in Text

1
Finding and Using Rhetorical-Semantic Relations
in Text

Sasha Blair-Goldensohn
28 April 2005

2
Outline

Background
Relations and Definitional QA
Exploring Statistical Techniques for Relation
Finding
Using Mined Relations For Fun and Profit

3
Situating This Talk

Various levels of textual relations (a.k.a.
predicates)
Word-level, e.g. hypernym-hyponym
WordNet catalogs many of these
Syntactic, e.g. verb-argument
Propositional, e.g. agent-patient
Wide array of work on parsers for syntactic and
propositional structure can derive relations at
the sentence level
Rhetorical, e.g. cause-effect, contrast
Work in this domain more theoretical, no general
use parser
This talk
How rhetorical-type relations can be useful for a
particular task
Interaction between rhetorical and word-level
relations
Experiments in detecting and using these
relations

4
Motivation

Definitional Questions
What/Who is X?
Concepts / Things / Processes Muzak, thin layer
chromatography, Hogwarts, Aum Shinrikyo, etc.
People Sonia Gandhi, Neil Diamond
Exploratory manual analysis of definitions
Some properties consistently good across topics
e.g., Superordinate, Cause-Effect, Contrast
Other good properties harder to generalize
Different for a chemical procedure (applications,
process components) vs. a cult (founder, beliefs,
membership)
Templates could be useful here for certain broad
categories (people, organizations, etc.)
but our focus is on a system to define any term

5
DefScriber A Hybrid System

Knowledge-driven three predicates (a.k.a.
relations)
Genus category information (Shiraz is a
grape.)
Species differentiating the subject from other
category members (Shiraz is used to make a
popular style of red wine)
Sentences containing both Genus and Species
identified by pattern
Non-specific Definitional (NSD) relevant
information that may be impractical to classify
generally (Reds are now in favor in Australia,
but in the 1970s white wine was more popular.)
NSD sentence identified (mainly) by function of
term concentration
Data-driven statistical summarization-esque
techniques to organize NSD information
Separate core concepts from more marginal ones
Cluster key subtopics
Order sentences in using importance and cohesion

6
Pattern-Based Relation Identification (G-S)
Example Sentence To Pattern
Matches Input Sentence
7
Example Output (From DUC 2004)
Who is Sonia Gandhi? Congress President Sonia
Gandhi, who married into what was once Indias
most powerful political family, is the first
non-Indian since independence 50 years ago to
lead the Congress. After Prime Minister Rajiv
Gandhi was assassinated in 1991, Gandhi was
persuaded by the Congress to succeed her husband
to continue leading the party as the chief, but
she refused. The BJP had shrugged off the
influence of the 51-year-old Sonia Gandhi when
she stepped into politics early this year,
dismissing her as a foreigner. Sonia Gandhi is
now an Indian citizen. Gandhi, who is 51, met her
husband when she was an 18-year old student at
Cambridge in London, the first time she was away
from her native Italy.

Starting with Genus and Species information gives
answer context
Word-based chaining of concepts for cohesion
Use of pronoun rewriting (Nenkova, 2003) to
clarify initial references and make later ones
more fluid
Contrast reads well but we were just lucky!
Statistical analysis (data-driven techniques)
create a definition that proceeds from more to
less central topics
Five extracted sentences extracted from four
different documents

8
Some Formal Evaluations

Survey-based evaluation (2003)
Users rated five qualitative aspects of
definitions
Showed significant improvement over query-focused
multi-document summarization
Automatic and manual evals in DUC 2004 Who is
X? task
Best results among 22 teams in automated (ROUGE)
evaluation (significantly better than 20)
Less distinguished in manual evaluation of
coverage, responsiveness, and quality
Little significant diff on avg, 1.1 systems
better, 2 worse
Because extractive task?

9
Informal Observations

DefScriber Pros
Robust Data-driven approaches will provide an
answer for any topic, dynamically
Stock answer for Why not use Google
definitions?
Nice answers when we find a G-S sentence and we
have some coherent threads
Cons
Predicate coverage for G-S only
Data-driven techniques are limited
Similarity-based (word-overlap)
Use data from retrieved documents only (mod IDF)

10
Adding Predicates

We want to add predicates that are consistently
useful, e.g. Cause-Effect, Contrast
Approach of syntax-tree patterns with high
precision (96) but uneven recall, and requires
significant manual effort
Initial markup study indicates these predicates
are stated in highly varied ways, and not always
explicitly, e.g.
E.g., Diabetes is a disease of the endocrine
system. Symptoms can include tiredness, thirst
and the need to urinate frequently.
Idea A technique to determine a relation using
word pairs, even when it is not explicitly stated

11
Strengthening Data-driven Techniques

We want to strengthen our techniques, because
word-based similarity can limit us in some cases,
e.g.
We would like to follow
Tachyons are a class of particles which are able
to travel faster than the speed of light.
With
By extension of this terminology, particles that
travel slower than light are called tardyons, and
particles, such as photons, that travel exactly
at the speed of light are called luxons.
but the felicitousness of this combination due to
Contrast is missed by similarity-based metric
Idea A technique to use relations in addition to
similarity / identity to a cohesion metric

12
Choosing an Approach

Learning relationship content, e.g. that disease
causes symptoms, or that faster contrasts with
slower
Echihabi and Marcu (2002) use cue phrases to mine
large corpora to construct a word-pair-based
classifier for four relations including Cause and
Contrast and detect these relations across
clauses or sentences
Lapata and Lascarides (2004) use a similar
approach for sentence-internal temporal relations
(Before, After, During, etc.) using word pairs
and other features like verb tenses
As opposed to learning patterns
Snow, Jurafsky et al. (2005) use a supervised
approach to learn patterns for the hypernymy
relation based on dependency-tree
e.g., X is a Y, X, Y and other Z, etc.
Some issues including usefulness for non-explicit
relations and cohesion application (more later)

13
The Approach

Begin by following Echihabi and Marcu
Compile a small set of cue-phrases for each
relation, e.g.
Cause Because X, Y, X. As a consequence, Y,
etc.
Contrast X. However, Y, X even though Y,
etc.
Baseline Choose random non-contiguous sents from
a document
Mine a large amount of (noisy) data
If we find a sentence Because x1 x2 xn ,
y1 y2 ym .
And note down that pairs (x1, y1) (xn, ym) were
observed in a causal setting
So if we find Because of poaching , smuggling
and related treacheries, tigers, rhinos and
civets are endangered species .
our belief that the pair (poaching ,endangered)
indicates a causal relationship is increased
Construct a naïve Bayes classifier s/t for two
text spans W1 and W2, the probability of Relation
rk is estimated as

14
Goals

Attain good accuracy
Not essential to exceed previous numbers since we
are concerned with application
Apply model to address DefScriber cons
Make a system that can be used in an online
setting
Consider alternative uses for model

15
System Design

Corpus Aquaint collection (LDC) of approximately
20M sentences of newswire text from 1996-2000
Mined examples of Cause and Contrast
Approx 407k cause
Approx 943k contrast
Trained system on approx 400k each, and added
400k no relation as baseline
No relation is taken as sentence pairs from the
same document which are at least 3 sents apart
64M word pairs with counts in MySQL Database
Efficiency concerns

16
Classification Task

Given two text spans, predict the relation
between them when cue patterns are removed
Used 10k held out test data for each relation
type
Baseline for binary classifier 50

17
Smoothing

Our data is very sparse given the possible number
of word pairs (99 of possible pairs unseen in
400k norel sentence pairs)
Using LaPlace smoothing, we estimate the
probability of a given word pair as

Where B is the number of unseen events. But with
? 1, 94 of the probability space goes to
unseen events
We can experiment with smaller ?
Or estimate values empirically

18
Effect of ? Parameter
19
Good-Turing Smoothing

Smoothes all counts based on ratio of frequencies
of frequencies
Gives N1/N .08 probability to unseen events
Depends on choice of smoothing function for
higher frequencies where we have few examples
In limited experiments, performed moderately
worse than LaPlace (within .05)
May improve with more data (and effort!)

20
Stemming

Experimented with Porter Stemmer to address
sparsity
Improves classification accuracy marginally (lt 1
percent)
However, somewhat coarse-grained for other tasks
Currently using unstemmed models lemmatization
might be better

21
Classification Results
22
Another Task Term Suggestion

We can also use these models to look for pairs of
words which are most strongly linked for a given
relation, e.g. Contrast
Using log-likelihood measure a la Dunning
Null hypothesis is that for two terms w and t,
the pair (w,t) is equally likely for the Contrast
model or not
H0 P(w,tContrastModel) P(w,tContrastModel)
P(wt)
So given a word w, we wish to suggest the term(s)
t for which H0 is most unlikely
Issues Evaluation and Sparsity

23
Term Suggestion an Example

Recall our example
Tachyons are a class of particles which are able
to travel faster than the speed of light.
By extension of this terminology, particles that
travel slower than light are called tardyons, and
particles, such as photons, that travel exactly
at the speed of light are called luxons.
Contrast terms above log-likelihood threshold
Speed not, still, only, speed, average, exactly,
football, slower, dial, race, faster, isnt,
efficient, strength, toughness
Faster buyer, perhaps, unk, speed
Class not, restroom, island, mostly, individual,
down, lost, subject, guys, only, schools
Non-content terms May indicate contrast language
Noise / context-specific suggestions
Useful terms some antonyms, but also
pseudo-coordinates, and often term itself we
are more interested in rhetorical relevance more
than strict relation
Seems promising, but only anecdotal evidence here

24
Applying to Definitional Answers

Several potential directions for algorithm input
from relation models
As additional weight when selecting next
sentence by measuring cause/contrast-ness of
pairing
Idea encourage causal / contrast chains in the
definition
Could be done as classification or with term
suggestions
Use term suggestions to boost importance
measure at word level
Idea even if a sentence doesnt seem ideal from
a cohesion perspective, it may be important
enough to insert anyway if it has strong relation
links with the cluster as a whole
Needle in Haystack issue
Which terms to use as seeds for suggestion?

25
Contrast Chain Weighting

Idea Use suggested terms rather than span
classifier since textual regularities of adjacent
sentences may be missing
Algorithm
Extract keywords K from current sent
For each k in K
Get terms T with LogLike(Contrast(t,K)) gt
threshold
For each potential next sent S, ContrastScore(S)
WeightedOverlap(T,S)
Choose best next S as a function of
ContrastScore(S) and other weights

26
Applying To DefinitionsWhat is bankruptcy?
Old Answer There are two types of bankruptcy -
Chapter 7 bankruptcy and Chapter 13
bankruptcy. People with insufficient assets or
income could still file a Chapter 7 bankruptcy,
which if approved by a judge erases debts
entirely after certain assets are forfeited. File
bankruptcy petition with the clerk of the
bankruptcy courts. Bankruptcy spawns new
restaurant Jan 25, 2005 Lansdale Reporter,
According to United States Bankruptcy Court
documents Memphis Magic filed for Chapter 11
bankruptcy on Oct. 29 which had voluntarily
... Some people file bankruptcy because of the
automatic stay provision, the part of the
bankruptcy code that offers legal protection
against bill collectors.
New Answer There are two types of bankruptcy
Chapter 7 bankruptcy and Chapter 13
bankruptcy. When a co-signer is involved in
consumer debt situations, a Chapter 13 proceeding
could protect the co-signer who has not also
filed for bankruptcy protection. People with
insufficient assets or income could still file a
Chapter 7 bankruptcy, which if approved by a
judge erases debts entirely after certain assets
are forfeited. Just filing the bankruptcy does
not breach the mortgage filing to make payments
according to the loan agreement is a
breach. Personal debt pushes more into bankruptcy
Jan 26, 2005 Manawatu Standard, The rules that
apply to personal bankruptcy are similar to those
that govern company bankruptcy the slate is
wiped clean after three years.
27
Further Uses for Model

For coherence/cohesion in general-purpose
summarization
For answering causal or comparative questions
Why did Dow-Corning go bankrupt?
Filter by terms that have causal relationship
with bankruptcy
How fast is a lion?
Filter by terms that are contrasted with fast
As added weight on bootstrapped data for, e.g.
opinions
If we believe term X has strong positive
orientation, and we believe X causes/contrasts
reliably with Y, we can increase/decrease our
belief about the positive orientation of Y
As general tool for applications that can accept
weaker inferences in exchange for broad coverage

28
Alternatives

Couldnt you just use WordNet?
Certainly complementary
WN has issues of coverage
Number of terms, number of relations both limited
Much more precise, but doesnt clearly contain
things like the contrast between speed and
strength
Probabilities over relations
What about patterns?
Again complementary
Issues with explicit statement of relations
For methods like Snow et al., need training data

29
Issues

Sparsity
More effort into smoothing (class-based methods,
principled estimation for parameter-based
techniques)
Additional data, features
Pattern inaccuracy
Estimated at up to 15 by Echihabi -- address
with syntax-aware patterns
e.g., " I think the bond is going to pass as it
is
because it ' s an excellent proposal , " she
said .
Pattern-learning can discover and rank patterns,
but most methods need training data
Evaluation
DUC, TREC, and others!

30
Wrap Up

Building a model of certain rhetorical-semantic
relations seems feasible
Validated previous work on classification
Exploring new avenues for applying these models
to QA, summarization, and beyond

31
Example Run What is the Hajj?

Goal-Driven
Use definitional predicates such as Genus and
Species to search for sentences conveying
typical definitional information.
Implementation combines feature-based
classification and pattern recognition over
syntax trees.

Document Retrieval
The Hajj, or pilgrimage to Makkah Mecca, is the
central duty of Islam. More than two million
Muslims are expected to take the Hajj this year.
Muslims must perform the hajj at least once in
their lifetime if physically and financially
able. The Hajj is a milestone event in a Muslim's
life. The annual hajj begins in the twelfth month
of the Islamic year (which is lunar, not solar,
so that hajj and Ramadan fall sometimes in
summer, sometimes in winter). The Hajj is a
week-long pilgrimage that begins in the 12th
month of the Islamic lunar calendar. Another
ceremony, which was not connected with the rites
of the Ka'ba before the rise of Islam, is the
Hajj, the annual pilgrimage to 'Arafat, about two
miles east of Mecca, toward Mina. The hajj is one
of five pillars that make up the foundation of
Islam.
11 Web documents, 1127 total sentences
Predicate Identification
383 Non-specific Definitional sentences
9 Genus-Species Sentences 1. The Hajj, or
pilgrimage to Makkah (Mecca), is the central duty
of Islam. 2. The Hajj is a milestone event in a
Muslim 's life. 3. The hajj is one of five
pillars that make up the foundation of Islam. 4.
The hajj is a week-long pilgrimage that begins in
the 12th month of the Islamic lunar calendar.

Data-Driven
Adapt techniques from summarization to maximize
content importance, cohesion and coverage.
Implementation uses lexical distance for
centroid-based clustering and cohesion metrics

Data-Driven Analysis
Clusters, ordering information
Definition Creation

Write a Comment

User Comments (0)