Improving LSAbased Summarization with Anaphora Resolution - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Improving LSAbased Summarization with Anaphora Resolution

Description:

Why anaphora resolution might help summarization. LSA-based Summarization ... Stands for General Tool for Anaphora Resolution. O-O architecture. XML in/ XML out ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 16
Provided by: ingjosefst
Category:

less

Transcript and Presenter's Notes

Title: Improving LSAbased Summarization with Anaphora Resolution


1
Improving LSA-based Summarization with Anaphora
Resolution
  • Josef SteinbergerMijail A. KabadjovMassimo
    PoesioOlivia Sanchez-Graillet

HLT/EMNLP 2005 Vancouver, Canada
2
Content
  • Exploiting coherence for summarization
  • Why anaphora resolution might help summarization
  • LSA-based Summarization
  • Anaphoric resolver GUITAR
  • Combining lexical and anaphoric knowledge
  • Evaluation results
  • Conclusion and future work

3
Exploiting coherence for summarization
  • Lexical approaches
  • Lexical relations used to identify central terms
    (Barzilay and Elhadad, 1997 Gong and Liu, 2002)
  • Coreference-based approaches
  • Identifying central terms by running coreference-
    (anaphoric-) resolver over the text (Boguraev and
    Kennedy, 1997 Baldwin and Morton, 1998 )
  • A combination of both?
  • Does adding the anaphoric information improve
    summarization performance?

4
Why anaphora resolution might help summarization
  • PRIEST IS CHARGED WITH POPE ATTACH
  • A Spanish priest was charged here today with
    attempting to murder the Pope. Juan Fernandez
    Krohn, aged 32, was arrested after a man armed
    with a bayonet approached the Pope while he was
    saying prayers at Fatima on Wednesday night.
    According to the police, Fernandez told the
    investigators today that he trained for the past
    six months for the assault. . . . If found
    guilty, the Spaniard faces a prison sentence of
    15-20 years.
  • (Boguraev and Kennedy, 1997)

5
Latent Semantic Analysis (LSA)
  • Technique for extracting hidden dimensions of the
    semantic representation of terms, sentences, or
    documents, on the basis of their contextual use
    (Landauer, 1997)
  • Used in various NLP applications (Information
    retrieval Berry et al., 1995 text segmentation
    Choi et al., 2001)
  • Gong and Liu (2002) - first LSA-based
    summarization approach

6
Singular Value Decomposition
7
LSA-based Summarization
  • Gong and Liu
  • for each row in VT (topic), choose the sentence
    with the highest value (best description of the
    topic)
  • Our approach
  • Compute the length of each sentence vector in
    matrix S.VT
  • Dimensionality reduction level (r) is learned
    from the data (take dimension i if singmax /
    singi lt threshold)

8
GuiTAR
  • Stands for General Tool for Anaphora Resolution
  • O-O architecture
  • XML in/ XML out
  • Version (2.1) resolves
  • Definite Descriptions (Vieira and Poesio, 2000)
  • Includes Discourse-new classifier (Poesio, et.
    al., 2005)
  • Personal Pronouns (Mitkov, 1998)
  • Possessive Pronouns (adapted Mitkovs algorithm)

9
Combining lexical and anaphoric knowledge
Substitution method
  • GuiTAR as pre-processor
  • Example
  • S If we dont do it now, Australia is going to
    be in deficit and debt into the next century.
  • S If Australia dont do spending cuts now,
    Australia is going to be in deficit and debt into
    the next century.

10
Combining lexical and anaphoric knowledge
Addition method
  • Modifying the source SVD matrix

11
Evaluation
  • 37 files from CAST corpus of manually produced
    summaries (Orasan et. al., 2003)
  • Anaphoric relations annotation
  • Parsed the corpus with Charniaks parser (2000)
  • Annotated with MMAX (Mueller and Strube, 2003)
  • Evaluation Measures
  • Relative Utility (Radev et. al., 2000)
  • Cosine Similarity
  • F-score
  • Main Topic Similarity (Steinberger and Jezek,
    2004)

12
Evaluation Upper bound
Not significant - significant (by t-test at 95
confidence)
13
Evaluation AR Performance
  • Anaphora resolution performance of GuiTAR v2.1

14
Evaluation GuiTAR improvement
Not significant - significant (by t-test at 95
confidence)
15
Conclusion and Future Work
  • Anaphoric information leads to significant
    improvement on summarization performance
  • Results suggest the better the AR performance,
    the greater the improvement
  • Next steps
  • Evaluate summarizer with GuiTAR v3.0 (PN)
  • Evaluate summarizer on DUC 2002 data
    preliminary results rank our summarizer 3rd from
    15 systems (measured by ROUGE)
  • Explore different weighting scheme (i.e., giving
    anaphors higher score)
Write a Comment
User Comments (0)
About PowerShow.com