Language%20Models%20for%20Hierarchical%20Summarization - PowerPoint PPT Presentation

About This Presentation
Title:

Language%20Models%20for%20Hierarchical%20Summarization

Description:

April 2003 -- Health officials from the World Health Organization and the ... mammal. fishery. species. marine. endangered. Dawn J. Lawrie. Loyola College in Maryland ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 29
Provided by: dawnl2
Learn more at: http://www.cs.loyola.edu
Category:

less

Transcript and Presenter's Notes

Title: Language%20Models%20for%20Hierarchical%20Summarization


1
Language Models for Hierarchical Summarization
  • Dawn J. Lawrie
  • Loyola College in Maryland

2
The Problem
3
Hierarchical Summaries
  • Word-based summary
  • Focus on topics of the documents
  • Allows users to navigate through the results

4
Example
  • April 2003 -- Health officials from the World
    Health Organization and the Centers for Disease
    Control and Prevention are tracking worldwide
    outbreaks of a contagious and potentially fatal
    form of pneumonia. Known as Severe Acute
    Respiratory Syndrome (SARS), the disease first
    emerged in the Guangdong province of China but
    has since spread with alarming speed....
  • ...Computers at the British Columbia Cancer
    Agency in Vancouver completed work Saturday on
    the coronavirus, a new form of which is thought
    to cause SARS, the agency said....

pneumonia
SARS
Asia
coronavirus
5
Approach
  • Use language models to describe the text
    summarized by the hierarchy
  • Not using language models to predict text
  • Unigram and bigram models
  • Use approximation of relative entropy to find
    topic-subtopic relationships Predictiveness
  • Use relative entropy to identify likely
    content-bearing words Topicality

6
Applications of Hierarchical Summaries
  • Full text hierarchies
  • Uses all text in the document set as the basis
    for the hierarchy
  • Snippet hierarchies
  • Uses summaries of documents as the text for the
    basis of the hierarchy
  • Snippets contain about 30 words that chosen from
    segments that contain query words

7
Example Snippet Hierarchy
8
Outline
  • Introduction
  • Related Work
  • Probabilistic Framework for Summarization
  • Evaluation
  • Conclusion

9
Related Work
  • Summarization
  • Abstracts (Conroy and OLeary, 2001 Kupiec et
    al., 1995 Salton et al., 1997 and Zha, 2002)
  • Titles (Witbrock and Mittal, 1999)
  • Keywords (Witten et al., 1998)
  • Clustering
  • Method of organization (Agrawal et al., 2000
    Cutting et al., 1992 Hofmann, 1999 Yand et al.,
    1998 Zamir et al., 1997 and Zamir and Etzioni,
    1999)
  • Hierarchies
  • Human and machine generated (Kosovac et al.,
    2000 Maedche et al., 2002 Sanderson and Croft,
    1999 Nevill-Manning et al., 1999 and Anick and
    Tipirneni, 1999)

10
Outline
  • Introduction
  • Related Work
  • Probabilistic Framework for Summarization
  • Evaluation
  • Conclusion

11
Model for Word-Based Summarization
  • Term Selection Formula
  • Top(wi) refers to topicality
  • Probability that word wi is in set of topic words
  • Pred(wiw1i-1) refers to predictiveness
  • Probability that word w is in set of predictive
    words given that the previous words have already
    been selected as predictive words

12
Estimating Topicality
  • Terms contribution to relative entropy
  • Compares document model to general English Model
  • Estimate unigram language model

13
KL Example
endangered
14
Identifying Predictive Words
  • Difference between topic and non-topic words
  • Topics co-occur with a distinct set of words
    (subtopics)
  • Use bigram model
  • Estimate Bigram Language Model
  • x is the maximum distance between wi and w

15
Approximation to Relative Entropy
  • Estimating predictiveness
  • Find topics using heuristic to Dominating Set
    Problem for graphs

16
Putting it All Together
  • 4-step process
  • (1) Preprocess document set
  • (2) Generate bigram language model
  • (3) Select the topic words
  • (4) Create a Hierarchy

recursive
17
Outline
  • Introduction
  • Related Work
  • Probabilistic Framework for Summarization
  • Evaluation
  • Conclusion

18
Evaluations
  • Non user-based evaluations
  • Summary Evaluation
  • Tests how well the topic terms chosen predict the
    vocabulary
  • Reachability Evaluation
  • Compare number of documents a user can find
  • Relevance Evaluation
  • Path length to find all relevant documents
  • User Study

19
Non User-Based Evaluation Test Set
  • Use 150 TREC queries
  • Document sets
  • 500 documents retrieved from associated TREC
    volumes
  • 500 documents retrieved from a news portion of
    associated TREC volumes
  • 1000 titles and snippets retrieved using Google
    Search Engine

20
Comparing to other hierarchies
  • Compare to subsumption and lexical hierarchies
  • Summary Evaluation
  • Subsumption Probabilistic Lexical
  • Reachability Evaluation
  • Probabilistic Lexical gt Subsumption
  • Relevance Evaluation
  • Subsumption Probabilistic Lexical

Some lexical hierarchies were significantly
better than the probabilistic hierarchies
21
Comparison to Other Techniques
  • Compare words chosen for hierarchy to top TF.IDF
    words
  • Hierarchy words significantly better according to
    Summary evaluation
  • Compare access provided by hierarchy to access
    provided by ranked list
  • Full text hierarchy with 10 topics groups no
    larger than 30 access 350 documents
  • Snippet hierarchy with 10 topics access 250
    documents

22
User Study
  • TREC style study retrieving aspects of a topic
  • Users asked to find all documents relevant to the
    query
  • 12 users
  • 10 queries
  • Compare ranked list and hierarchy to ranked list
    alone

23
Results
  • Ranked list significantly better for aspectual
    recall
  • Hierarchy slightly better for precision not
    significant
  • Recall performance seemed to improve during the
    course of using the hierarchy
  • All but one user preferred the hierarchy

24
Discussion
  • Choice of study
  • Believed summary of results help users find all
    aspects
  • Found users unable to understand topics in
    hierarchy because unfamiliar with query topic
  • Better suited task
  • User familiar with topic and interested in
    details
  • Able to understand what words and phrases
    describe

25
Outline
  • Introduction
  • Related Work
  • Probabilistic Framework for Summarization
  • Evaluation
  • Conclusion

26
Future Work
  • Improve the hierarchy
  • Refine estimations of topicality and
    predictiveness
  • Learn more about effect of different segment
    sizes and use of natural language features
  • Redesign the interface for the hierarchy so more
    user friendly
  • Develop a new topic model for snippets that
    better handles the tiny documents

27
Future Work
  • Explore the use of topic hierarchies in other
    organizational tasks
  • Personal collections of documents
  • E-mails
  • Create cross-lingual hierarchies using excerpts
    of documents
  • Compare auto-generated hierarchies manually
    created ones

28
Conclusions
  • Developed a formal framework for topic
    hierarchies
  • Used language models as an abstraction of
    documents
  • Developed non user-based evaluations to combat
    problems associated with user studies
Write a Comment
User Comments (0)
About PowerShow.com