Title: Finding and Using Rhetorical-Semantic Relations in Text
1Finding and Using Rhetorical-Semantic Relations
in Text
- Sasha Blair-Goldensohn
- 28 April 2005
2Outline
- Background
- Relations and Definitional QA
- Exploring Statistical Techniques for Relation
Finding - Using Mined Relations For Fun and Profit
3Situating This Talk
- Various levels of textual relations (a.k.a.
predicates) - Word-level, e.g. hypernym-hyponym
- WordNet catalogs many of these
- Syntactic, e.g. verb-argument
- Propositional, e.g. agent-patient
- Wide array of work on parsers for syntactic and
propositional structure can derive relations at
the sentence level - Rhetorical, e.g. cause-effect, contrast
- Work in this domain more theoretical, no general
use parser - This talk
- How rhetorical-type relations can be useful for a
particular task - Interaction between rhetorical and word-level
relations - Experiments in detecting and using these
relations
4Motivation
- Definitional Questions
- What/Who is X?
- Concepts / Things / Processes Muzak, thin layer
chromatography, Hogwarts, Aum Shinrikyo, etc. - People Sonia Gandhi, Neil Diamond
- Exploratory manual analysis of definitions
- Some properties consistently good across topics
- e.g., Superordinate, Cause-Effect, Contrast
- Other good properties harder to generalize
- Different for a chemical procedure (applications,
process components) vs. a cult (founder, beliefs,
membership) - Templates could be useful here for certain broad
categories (people, organizations, etc.) - but our focus is on a system to define any term
5DefScriber A Hybrid System
- Knowledge-driven three predicates (a.k.a.
relations) - Genus category information (Shiraz is a
grape.) - Species differentiating the subject from other
category members (Shiraz is used to make a
popular style of red wine) - Sentences containing both Genus and Species
identified by pattern - Non-specific Definitional (NSD) relevant
information that may be impractical to classify
generally (Reds are now in favor in Australia,
but in the 1970s white wine was more popular.) - NSD sentence identified (mainly) by function of
term concentration - Data-driven statistical summarization-esque
techniques to organize NSD information - Separate core concepts from more marginal ones
- Cluster key subtopics
- Order sentences in using importance and cohesion
6Pattern-Based Relation Identification (G-S)
Example Sentence To Pattern
Matches Input Sentence
7Example Output (From DUC 2004)
Who is Sonia Gandhi? Congress President Sonia
Gandhi, who married into what was once Indias
most powerful political family, is the first
non-Indian since independence 50 years ago to
lead the Congress. After Prime Minister Rajiv
Gandhi was assassinated in 1991, Gandhi was
persuaded by the Congress to succeed her husband
to continue leading the party as the chief, but
she refused. The BJP had shrugged off the
influence of the 51-year-old Sonia Gandhi when
she stepped into politics early this year,
dismissing her as a foreigner. Sonia Gandhi is
now an Indian citizen. Gandhi, who is 51, met her
husband when she was an 18-year old student at
Cambridge in London, the first time she was away
from her native Italy.
- Starting with Genus and Species information gives
answer context - Word-based chaining of concepts for cohesion
- Use of pronoun rewriting (Nenkova, 2003) to
clarify initial references and make later ones
more fluid - Contrast reads well but we were just lucky!
- Statistical analysis (data-driven techniques)
create a definition that proceeds from more to
less central topics - Five extracted sentences extracted from four
different documents
8Some Formal Evaluations
- Survey-based evaluation (2003)
- Users rated five qualitative aspects of
definitions - Showed significant improvement over query-focused
multi-document summarization - Automatic and manual evals in DUC 2004 Who is
X? task - Best results among 22 teams in automated (ROUGE)
evaluation (significantly better than 20) - Less distinguished in manual evaluation of
coverage, responsiveness, and quality - Little significant diff on avg, 1.1 systems
better, 2 worse - Because extractive task?
9Informal Observations
- DefScriber Pros
- Robust Data-driven approaches will provide an
answer for any topic, dynamically - Stock answer for Why not use Google
definitions? - Nice answers when we find a G-S sentence and we
have some coherent threads - Cons
- Predicate coverage for G-S only
- Data-driven techniques are limited
- Similarity-based (word-overlap)
- Use data from retrieved documents only (mod IDF)
10Adding Predicates
- We want to add predicates that are consistently
useful, e.g. Cause-Effect, Contrast - Approach of syntax-tree patterns with high
precision (96) but uneven recall, and requires
significant manual effort - Initial markup study indicates these predicates
are stated in highly varied ways, and not always
explicitly, e.g. - E.g., Diabetes is a disease of the endocrine
system. Symptoms can include tiredness, thirst
and the need to urinate frequently. - Idea A technique to determine a relation using
word pairs, even when it is not explicitly stated
11Strengthening Data-driven Techniques
- We want to strengthen our techniques, because
word-based similarity can limit us in some cases,
e.g. - We would like to follow
- Tachyons are a class of particles which are able
to travel faster than the speed of light. - With
- By extension of this terminology, particles that
travel slower than light are called tardyons, and
particles, such as photons, that travel exactly
at the speed of light are called luxons. - but the felicitousness of this combination due to
Contrast is missed by similarity-based metric - Idea A technique to use relations in addition to
similarity / identity to a cohesion metric
12Choosing an Approach
- Learning relationship content, e.g. that disease
causes symptoms, or that faster contrasts with
slower - Echihabi and Marcu (2002) use cue phrases to mine
large corpora to construct a word-pair-based
classifier for four relations including Cause and
Contrast and detect these relations across
clauses or sentences - Lapata and Lascarides (2004) use a similar
approach for sentence-internal temporal relations
(Before, After, During, etc.) using word pairs
and other features like verb tenses - As opposed to learning patterns
- Snow, Jurafsky et al. (2005) use a supervised
approach to learn patterns for the hypernymy
relation based on dependency-tree - e.g., X is a Y, X, Y and other Z, etc.
- Some issues including usefulness for non-explicit
relations and cohesion application (more later)
13The Approach
- Begin by following Echihabi and Marcu
- Compile a small set of cue-phrases for each
relation, e.g. - Cause Because X, Y, X. As a consequence, Y,
etc. - Contrast X. However, Y, X even though Y,
etc. - Baseline Choose random non-contiguous sents from
a document - Mine a large amount of (noisy) data
- If we find a sentence Because x1 x2 xn ,
y1 y2 ym . - And note down that pairs (x1, y1) (xn, ym) were
observed in a causal setting - So if we find Because of poaching , smuggling
and related treacheries, tigers, rhinos and
civets are endangered species . - our belief that the pair (poaching ,endangered)
indicates a causal relationship is increased - Construct a naïve Bayes classifier s/t for two
text spans W1 and W2, the probability of Relation
rk is estimated as
14Goals
- Attain good accuracy
- Not essential to exceed previous numbers since we
are concerned with application - Apply model to address DefScriber cons
- Make a system that can be used in an online
setting - Consider alternative uses for model
15System Design
- Corpus Aquaint collection (LDC) of approximately
20M sentences of newswire text from 1996-2000 - Mined examples of Cause and Contrast
- Approx 407k cause
- Approx 943k contrast
- Trained system on approx 400k each, and added
400k no relation as baseline - No relation is taken as sentence pairs from the
same document which are at least 3 sents apart - 64M word pairs with counts in MySQL Database
- Efficiency concerns
16Classification Task
- Given two text spans, predict the relation
between them when cue patterns are removed - Used 10k held out test data for each relation
type - Baseline for binary classifier 50
17Smoothing
- Our data is very sparse given the possible number
of word pairs (99 of possible pairs unseen in
400k norel sentence pairs) - Using LaPlace smoothing, we estimate the
probability of a given word pair as
- Where B is the number of unseen events. But with
? 1, 94 of the probability space goes to
unseen events - We can experiment with smaller ?
- Or estimate values empirically
18Effect of ? Parameter
19Good-Turing Smoothing
- Smoothes all counts based on ratio of frequencies
of frequencies - Gives N1/N .08 probability to unseen events
- Depends on choice of smoothing function for
higher frequencies where we have few examples - In limited experiments, performed moderately
worse than LaPlace (within .05) - May improve with more data (and effort!)
20Stemming
- Experimented with Porter Stemmer to address
sparsity - Improves classification accuracy marginally (lt 1
percent) - However, somewhat coarse-grained for other tasks
- Currently using unstemmed models lemmatization
might be better
21Classification Results
22Another Task Term Suggestion
- We can also use these models to look for pairs of
words which are most strongly linked for a given
relation, e.g. Contrast - Using log-likelihood measure a la Dunning
- Null hypothesis is that for two terms w and t,
the pair (w,t) is equally likely for the Contrast
model or not - H0 P(w,tContrastModel) P(w,tContrastModel)
P(wt) - So given a word w, we wish to suggest the term(s)
t for which H0 is most unlikely - Issues Evaluation and Sparsity
23Term Suggestion an Example
- Recall our example
- Tachyons are a class of particles which are able
to travel faster than the speed of light. - By extension of this terminology, particles that
travel slower than light are called tardyons, and
particles, such as photons, that travel exactly
at the speed of light are called luxons. - Contrast terms above log-likelihood threshold
- Speed not, still, only, speed, average, exactly,
football, slower, dial, race, faster, isnt,
efficient, strength, toughness - Faster buyer, perhaps, unk, speed
- Class not, restroom, island, mostly, individual,
down, lost, subject, guys, only, schools - Non-content terms May indicate contrast language
- Noise / context-specific suggestions
- Useful terms some antonyms, but also
pseudo-coordinates, and often term itself we
are more interested in rhetorical relevance more
than strict relation - Seems promising, but only anecdotal evidence here
24Applying to Definitional Answers
- Several potential directions for algorithm input
from relation models - As additional weight when selecting next
sentence by measuring cause/contrast-ness of
pairing - Idea encourage causal / contrast chains in the
definition - Could be done as classification or with term
suggestions - Use term suggestions to boost importance
measure at word level - Idea even if a sentence doesnt seem ideal from
a cohesion perspective, it may be important
enough to insert anyway if it has strong relation
links with the cluster as a whole - Needle in Haystack issue
- Which terms to use as seeds for suggestion?
25Contrast Chain Weighting
- Idea Use suggested terms rather than span
classifier since textual regularities of adjacent
sentences may be missing - Algorithm
- Extract keywords K from current sent
- For each k in K
- Get terms T with LogLike(Contrast(t,K)) gt
threshold - For each potential next sent S, ContrastScore(S)
WeightedOverlap(T,S) - Choose best next S as a function of
ContrastScore(S) and other weights
26Applying To DefinitionsWhat is bankruptcy?
Old Answer There are two types of bankruptcy -
Chapter 7 bankruptcy and Chapter 13
bankruptcy. People with insufficient assets or
income could still file a Chapter 7 bankruptcy,
which if approved by a judge erases debts
entirely after certain assets are forfeited. File
bankruptcy petition with the clerk of the
bankruptcy courts. Bankruptcy spawns new
restaurant Jan 25, 2005 Lansdale Reporter,
According to United States Bankruptcy Court
documents Memphis Magic filed for Chapter 11
bankruptcy on Oct. 29 which had voluntarily
... Some people file bankruptcy because of the
automatic stay provision, the part of the
bankruptcy code that offers legal protection
against bill collectors.
New Answer There are two types of bankruptcy
Chapter 7 bankruptcy and Chapter 13
bankruptcy. When a co-signer is involved in
consumer debt situations, a Chapter 13 proceeding
could protect the co-signer who has not also
filed for bankruptcy protection. People with
insufficient assets or income could still file a
Chapter 7 bankruptcy, which if approved by a
judge erases debts entirely after certain assets
are forfeited. Just filing the bankruptcy does
not breach the mortgage filing to make payments
according to the loan agreement is a
breach. Personal debt pushes more into bankruptcy
Jan 26, 2005 Manawatu Standard, The rules that
apply to personal bankruptcy are similar to those
that govern company bankruptcy the slate is
wiped clean after three years.
27Further Uses for Model
- For coherence/cohesion in general-purpose
summarization - For answering causal or comparative questions
- Why did Dow-Corning go bankrupt?
- Filter by terms that have causal relationship
with bankruptcy - How fast is a lion?
- Filter by terms that are contrasted with fast
- As added weight on bootstrapped data for, e.g.
opinions - If we believe term X has strong positive
orientation, and we believe X causes/contrasts
reliably with Y, we can increase/decrease our
belief about the positive orientation of Y - As general tool for applications that can accept
weaker inferences in exchange for broad coverage
28Alternatives
- Couldnt you just use WordNet?
- Certainly complementary
- WN has issues of coverage
- Number of terms, number of relations both limited
- Much more precise, but doesnt clearly contain
things like the contrast between speed and
strength - Probabilities over relations
- What about patterns?
- Again complementary
- Issues with explicit statement of relations
- For methods like Snow et al., need training data
29Issues
- Sparsity
- More effort into smoothing (class-based methods,
principled estimation for parameter-based
techniques) - Additional data, features
- Pattern inaccuracy
- Estimated at up to 15 by Echihabi -- address
with syntax-aware patterns - e.g., " I think the bond is going to pass as it
is - because it ' s an excellent proposal , " she
said . - Pattern-learning can discover and rank patterns,
but most methods need training data - Evaluation
- DUC, TREC, and others!
30Wrap Up
- Building a model of certain rhetorical-semantic
relations seems feasible - Validated previous work on classification
- Exploring new avenues for applying these models
to QA, summarization, and beyond
31Example Run What is the Hajj?
- Goal-Driven
- Use definitional predicates such as Genus and
Species to search for sentences conveying
typical definitional information. - Implementation combines feature-based
classification and pattern recognition over
syntax trees.
Document Retrieval
The Hajj, or pilgrimage to Makkah Mecca, is the
central duty of Islam. More than two million
Muslims are expected to take the Hajj this year.
Muslims must perform the hajj at least once in
their lifetime if physically and financially
able. The Hajj is a milestone event in a Muslim's
life. The annual hajj begins in the twelfth month
of the Islamic year (which is lunar, not solar,
so that hajj and Ramadan fall sometimes in
summer, sometimes in winter). The Hajj is a
week-long pilgrimage that begins in the 12th
month of the Islamic lunar calendar. Another
ceremony, which was not connected with the rites
of the Ka'ba before the rise of Islam, is the
Hajj, the annual pilgrimage to 'Arafat, about two
miles east of Mecca, toward Mina. The hajj is one
of five pillars that make up the foundation of
Islam.
11 Web documents, 1127 total sentences
Predicate Identification
383 Non-specific Definitional sentences
9 Genus-Species Sentences 1. The Hajj, or
pilgrimage to Makkah (Mecca), is the central duty
of Islam. 2. The Hajj is a milestone event in a
Muslim 's life. 3. The hajj is one of five
pillars that make up the foundation of Islam. 4.
The hajj is a week-long pilgrimage that begins in
the 12th month of the Islamic lunar calendar.
- Data-Driven
- Adapt techniques from summarization to maximize
content importance, cohesion and coverage. - Implementation uses lexical distance for
centroid-based clustering and cohesion metrics
Data-Driven Analysis
Clusters, ordering information
Definition Creation