Finding and Using Rhetorical-Semantic Relations in Text - PowerPoint PPT Presentation

About This Presentation
Title:

Finding and Using Rhetorical-Semantic Relations in Text

Description:

Finding and Using Rhetorical-Semantic Relations in Text Sasha Blair-Goldensohn 28 April 2005 Outline Background Relations and Definitional QA Exploring Statistical ... – PowerPoint PPT presentation

Number of Views:211
Avg rating:3.0/5.0
Slides: 32
Provided by: SashaBlai
Category:

less

Transcript and Presenter's Notes

Title: Finding and Using Rhetorical-Semantic Relations in Text


1
Finding and Using Rhetorical-Semantic Relations
in Text
  • Sasha Blair-Goldensohn
  • 28 April 2005

2
Outline
  • Background
  • Relations and Definitional QA
  • Exploring Statistical Techniques for Relation
    Finding
  • Using Mined Relations For Fun and Profit

3
Situating This Talk
  • Various levels of textual relations (a.k.a.
    predicates)
  • Word-level, e.g. hypernym-hyponym
  • WordNet catalogs many of these
  • Syntactic, e.g. verb-argument
  • Propositional, e.g. agent-patient
  • Wide array of work on parsers for syntactic and
    propositional structure can derive relations at
    the sentence level
  • Rhetorical, e.g. cause-effect, contrast
  • Work in this domain more theoretical, no general
    use parser
  • This talk
  • How rhetorical-type relations can be useful for a
    particular task
  • Interaction between rhetorical and word-level
    relations
  • Experiments in detecting and using these
    relations

4
Motivation
  • Definitional Questions
  • What/Who is X?
  • Concepts / Things / Processes Muzak, thin layer
    chromatography, Hogwarts, Aum Shinrikyo, etc.
  • People Sonia Gandhi, Neil Diamond
  • Exploratory manual analysis of definitions
  • Some properties consistently good across topics
  • e.g., Superordinate, Cause-Effect, Contrast
  • Other good properties harder to generalize
  • Different for a chemical procedure (applications,
    process components) vs. a cult (founder, beliefs,
    membership)
  • Templates could be useful here for certain broad
    categories (people, organizations, etc.)
  • but our focus is on a system to define any term

5
DefScriber A Hybrid System
  • Knowledge-driven three predicates (a.k.a.
    relations)
  • Genus category information (Shiraz is a
    grape.)
  • Species differentiating the subject from other
    category members (Shiraz is used to make a
    popular style of red wine)
  • Sentences containing both Genus and Species
    identified by pattern
  • Non-specific Definitional (NSD) relevant
    information that may be impractical to classify
    generally (Reds are now in favor in Australia,
    but in the 1970s white wine was more popular.)
  • NSD sentence identified (mainly) by function of
    term concentration
  • Data-driven statistical summarization-esque
    techniques to organize NSD information
  • Separate core concepts from more marginal ones
  • Cluster key subtopics
  • Order sentences in using importance and cohesion

6
Pattern-Based Relation Identification (G-S)
Example Sentence To Pattern
Matches Input Sentence
7
Example Output (From DUC 2004)
Who is Sonia Gandhi? Congress President Sonia
Gandhi, who married into what was once Indias
most powerful political family, is the first
non-Indian since independence 50 years ago to
lead the Congress. After Prime Minister Rajiv
Gandhi was assassinated in 1991, Gandhi was
persuaded by the Congress to succeed her husband
to continue leading the party as the chief, but
she refused. The BJP had shrugged off the
influence of the 51-year-old Sonia Gandhi when
she stepped into politics early this year,
dismissing her as a foreigner. Sonia Gandhi is
now an Indian citizen. Gandhi, who is 51, met her
husband when she was an 18-year old student at
Cambridge in London, the first time she was away
from her native Italy.
  • Starting with Genus and Species information gives
    answer context
  • Word-based chaining of concepts for cohesion
  • Use of pronoun rewriting (Nenkova, 2003) to
    clarify initial references and make later ones
    more fluid
  • Contrast reads well but we were just lucky!
  • Statistical analysis (data-driven techniques)
    create a definition that proceeds from more to
    less central topics
  • Five extracted sentences extracted from four
    different documents

8
Some Formal Evaluations
  • Survey-based evaluation (2003)
  • Users rated five qualitative aspects of
    definitions
  • Showed significant improvement over query-focused
    multi-document summarization
  • Automatic and manual evals in DUC 2004 Who is
    X? task
  • Best results among 22 teams in automated (ROUGE)
    evaluation (significantly better than 20)
  • Less distinguished in manual evaluation of
    coverage, responsiveness, and quality
  • Little significant diff on avg, 1.1 systems
    better, 2 worse
  • Because extractive task?

9
Informal Observations
  • DefScriber Pros
  • Robust Data-driven approaches will provide an
    answer for any topic, dynamically
  • Stock answer for Why not use Google
    definitions?
  • Nice answers when we find a G-S sentence and we
    have some coherent threads
  • Cons
  • Predicate coverage for G-S only
  • Data-driven techniques are limited
  • Similarity-based (word-overlap)
  • Use data from retrieved documents only (mod IDF)

10
Adding Predicates
  • We want to add predicates that are consistently
    useful, e.g. Cause-Effect, Contrast
  • Approach of syntax-tree patterns with high
    precision (96) but uneven recall, and requires
    significant manual effort
  • Initial markup study indicates these predicates
    are stated in highly varied ways, and not always
    explicitly, e.g.
  • E.g., Diabetes is a disease of the endocrine
    system. Symptoms can include tiredness, thirst
    and the need to urinate frequently.
  • Idea A technique to determine a relation using
    word pairs, even when it is not explicitly stated

11
Strengthening Data-driven Techniques
  • We want to strengthen our techniques, because
    word-based similarity can limit us in some cases,
    e.g.
  • We would like to follow
  • Tachyons are a class of particles which are able
    to travel faster than the speed of light.
  • With
  • By extension of this terminology, particles that
    travel slower than light are called tardyons, and
    particles, such as photons, that travel exactly
    at the speed of light are called luxons.
  • but the felicitousness of this combination due to
    Contrast is missed by similarity-based metric
  • Idea A technique to use relations in addition to
    similarity / identity to a cohesion metric

12
Choosing an Approach
  • Learning relationship content, e.g. that disease
    causes symptoms, or that faster contrasts with
    slower
  • Echihabi and Marcu (2002) use cue phrases to mine
    large corpora to construct a word-pair-based
    classifier for four relations including Cause and
    Contrast and detect these relations across
    clauses or sentences
  • Lapata and Lascarides (2004) use a similar
    approach for sentence-internal temporal relations
    (Before, After, During, etc.) using word pairs
    and other features like verb tenses
  • As opposed to learning patterns
  • Snow, Jurafsky et al. (2005) use a supervised
    approach to learn patterns for the hypernymy
    relation based on dependency-tree
  • e.g., X is a Y, X, Y and other Z, etc.
  • Some issues including usefulness for non-explicit
    relations and cohesion application (more later)

13
The Approach
  • Begin by following Echihabi and Marcu
  • Compile a small set of cue-phrases for each
    relation, e.g.
  • Cause Because X, Y, X. As a consequence, Y,
    etc.
  • Contrast X. However, Y, X even though Y,
    etc.
  • Baseline Choose random non-contiguous sents from
    a document
  • Mine a large amount of (noisy) data
  • If we find a sentence Because x1 x2 xn ,
    y1 y2 ym .
  • And note down that pairs (x1, y1) (xn, ym) were
    observed in a causal setting
  • So if we find Because of poaching , smuggling
    and related treacheries, tigers, rhinos and
    civets are endangered species .
  • our belief that the pair (poaching ,endangered)
    indicates a causal relationship is increased
  • Construct a naïve Bayes classifier s/t for two
    text spans W1 and W2, the probability of Relation
    rk is estimated as

14
Goals
  • Attain good accuracy
  • Not essential to exceed previous numbers since we
    are concerned with application
  • Apply model to address DefScriber cons
  • Make a system that can be used in an online
    setting
  • Consider alternative uses for model

15
System Design
  • Corpus Aquaint collection (LDC) of approximately
    20M sentences of newswire text from 1996-2000
  • Mined examples of Cause and Contrast
  • Approx 407k cause
  • Approx 943k contrast
  • Trained system on approx 400k each, and added
    400k no relation as baseline
  • No relation is taken as sentence pairs from the
    same document which are at least 3 sents apart
  • 64M word pairs with counts in MySQL Database
  • Efficiency concerns

16
Classification Task
  • Given two text spans, predict the relation
    between them when cue patterns are removed
  • Used 10k held out test data for each relation
    type
  • Baseline for binary classifier 50

17
Smoothing
  • Our data is very sparse given the possible number
    of word pairs (99 of possible pairs unseen in
    400k norel sentence pairs)
  • Using LaPlace smoothing, we estimate the
    probability of a given word pair as
  • Where B is the number of unseen events. But with
    ? 1, 94 of the probability space goes to
    unseen events
  • We can experiment with smaller ?
  • Or estimate values empirically

18
Effect of ? Parameter
19
Good-Turing Smoothing
  • Smoothes all counts based on ratio of frequencies
    of frequencies
  • Gives N1/N .08 probability to unseen events
  • Depends on choice of smoothing function for
    higher frequencies where we have few examples
  • In limited experiments, performed moderately
    worse than LaPlace (within .05)
  • May improve with more data (and effort!)

20
Stemming
  • Experimented with Porter Stemmer to address
    sparsity
  • Improves classification accuracy marginally (lt 1
    percent)
  • However, somewhat coarse-grained for other tasks
  • Currently using unstemmed models lemmatization
    might be better

21
Classification Results
22
Another Task Term Suggestion
  • We can also use these models to look for pairs of
    words which are most strongly linked for a given
    relation, e.g. Contrast
  • Using log-likelihood measure a la Dunning
  • Null hypothesis is that for two terms w and t,
    the pair (w,t) is equally likely for the Contrast
    model or not
  • H0 P(w,tContrastModel) P(w,tContrastModel)
    P(wt)
  • So given a word w, we wish to suggest the term(s)
    t for which H0 is most unlikely
  • Issues Evaluation and Sparsity

23
Term Suggestion an Example
  • Recall our example
  • Tachyons are a class of particles which are able
    to travel faster than the speed of light.
  • By extension of this terminology, particles that
    travel slower than light are called tardyons, and
    particles, such as photons, that travel exactly
    at the speed of light are called luxons.
  • Contrast terms above log-likelihood threshold
  • Speed not, still, only, speed, average, exactly,
    football, slower, dial, race, faster, isnt,
    efficient, strength, toughness
  • Faster buyer, perhaps, unk, speed
  • Class not, restroom, island, mostly, individual,
    down, lost, subject, guys, only, schools
  • Non-content terms May indicate contrast language
  • Noise / context-specific suggestions
  • Useful terms some antonyms, but also
    pseudo-coordinates, and often term itself we
    are more interested in rhetorical relevance more
    than strict relation
  • Seems promising, but only anecdotal evidence here

24
Applying to Definitional Answers
  • Several potential directions for algorithm input
    from relation models
  • As additional weight when selecting next
    sentence by measuring cause/contrast-ness of
    pairing
  • Idea encourage causal / contrast chains in the
    definition
  • Could be done as classification or with term
    suggestions
  • Use term suggestions to boost importance
    measure at word level
  • Idea even if a sentence doesnt seem ideal from
    a cohesion perspective, it may be important
    enough to insert anyway if it has strong relation
    links with the cluster as a whole
  • Needle in Haystack issue
  • Which terms to use as seeds for suggestion?

25
Contrast Chain Weighting
  • Idea Use suggested terms rather than span
    classifier since textual regularities of adjacent
    sentences may be missing
  • Algorithm
  • Extract keywords K from current sent
  • For each k in K
  • Get terms T with LogLike(Contrast(t,K)) gt
    threshold
  • For each potential next sent S, ContrastScore(S)
    WeightedOverlap(T,S)
  • Choose best next S as a function of
    ContrastScore(S) and other weights

26
Applying To DefinitionsWhat is bankruptcy?
Old Answer There are two types of bankruptcy -
Chapter 7 bankruptcy and Chapter 13
bankruptcy. People with insufficient assets or
income could still file a Chapter 7 bankruptcy,
which if approved by a judge erases debts
entirely after certain assets are forfeited. File
bankruptcy petition with the clerk of the
bankruptcy courts. Bankruptcy spawns new
restaurant Jan 25, 2005 Lansdale Reporter,
According to United States Bankruptcy Court
documents Memphis Magic filed for Chapter 11
bankruptcy on Oct. 29 which had voluntarily
... Some people file bankruptcy because of the
automatic stay provision, the part of the
bankruptcy code that offers legal protection
against bill collectors.
New Answer There are two types of bankruptcy
Chapter 7 bankruptcy and Chapter 13
bankruptcy. When a co-signer is involved in
consumer debt situations, a Chapter 13 proceeding
could protect the co-signer who has not also
filed for bankruptcy protection. People with
insufficient assets or income could still file a
Chapter 7 bankruptcy, which if approved by a
judge erases debts entirely after certain assets
are forfeited. Just filing the bankruptcy does
not breach the mortgage filing to make payments
according to the loan agreement is a
breach. Personal debt pushes more into bankruptcy
Jan 26, 2005 Manawatu Standard, The rules that
apply to personal bankruptcy are similar to those
that govern company bankruptcy the slate is
wiped clean after three years.
27
Further Uses for Model
  • For coherence/cohesion in general-purpose
    summarization
  • For answering causal or comparative questions
  • Why did Dow-Corning go bankrupt?
  • Filter by terms that have causal relationship
    with bankruptcy
  • How fast is a lion?
  • Filter by terms that are contrasted with fast
  • As added weight on bootstrapped data for, e.g.
    opinions
  • If we believe term X has strong positive
    orientation, and we believe X causes/contrasts
    reliably with Y, we can increase/decrease our
    belief about the positive orientation of Y
  • As general tool for applications that can accept
    weaker inferences in exchange for broad coverage

28
Alternatives
  • Couldnt you just use WordNet?
  • Certainly complementary
  • WN has issues of coverage
  • Number of terms, number of relations both limited
  • Much more precise, but doesnt clearly contain
    things like the contrast between speed and
    strength
  • Probabilities over relations
  • What about patterns?
  • Again complementary
  • Issues with explicit statement of relations
  • For methods like Snow et al., need training data

29
Issues
  • Sparsity
  • More effort into smoothing (class-based methods,
    principled estimation for parameter-based
    techniques)
  • Additional data, features
  • Pattern inaccuracy
  • Estimated at up to 15 by Echihabi -- address
    with syntax-aware patterns
  • e.g., " I think the bond is going to pass as it
    is
  • because it ' s an excellent proposal , " she
    said .
  • Pattern-learning can discover and rank patterns,
    but most methods need training data
  • Evaluation
  • DUC, TREC, and others!

30
Wrap Up
  • Building a model of certain rhetorical-semantic
    relations seems feasible
  • Validated previous work on classification
  • Exploring new avenues for applying these models
    to QA, summarization, and beyond

31
Example Run What is the Hajj?
  • Goal-Driven
  • Use definitional predicates such as Genus and
    Species to search for sentences conveying
    typical definitional information.
  • Implementation combines feature-based
    classification and pattern recognition over
    syntax trees.

Document Retrieval
The Hajj, or pilgrimage to Makkah Mecca, is the
central duty of Islam. More than two million
Muslims are expected to take the Hajj this year.
Muslims must perform the hajj at least once in
their lifetime if physically and financially
able. The Hajj is a milestone event in a Muslim's
life. The annual hajj begins in the twelfth month
of the Islamic year (which is lunar, not solar,
so that hajj and Ramadan fall sometimes in
summer, sometimes in winter). The Hajj is a
week-long pilgrimage that begins in the 12th
month of the Islamic lunar calendar. Another
ceremony, which was not connected with the rites
of the Ka'ba before the rise of Islam, is the
Hajj, the annual pilgrimage to 'Arafat, about two
miles east of Mecca, toward Mina. The hajj is one
of five pillars that make up the foundation of
Islam.
11 Web documents, 1127 total sentences
Predicate Identification
383 Non-specific Definitional sentences
9 Genus-Species Sentences 1. The Hajj, or
pilgrimage to Makkah (Mecca), is the central duty
of Islam. 2. The Hajj is a milestone event in a
Muslim 's life. 3. The hajj is one of five
pillars that make up the foundation of Islam. 4.
The hajj is a week-long pilgrimage that begins in
the 12th month of the Islamic lunar calendar.
  • Data-Driven
  • Adapt techniques from summarization to maximize
    content importance, cohesion and coverage.
  • Implementation uses lexical distance for
    centroid-based clustering and cohesion metrics

Data-Driven Analysis
Clusters, ordering information
Definition Creation
Write a Comment
User Comments (0)
About PowerShow.com