Title: I256: Applied Natural Language Processing
1I256 Applied Natural Language Processing
Marti Hearst Oct 2, 2006
2Contents
- Introduction and Applications
- Types of summarization tasks
- Basic paradigms
- Single document summarization
- Evaluation methods
3Introduction
- The problem Information overload
- 4 Billion URLs indexed by Google
- 200 TB of data on the Web Lyman and Varian 03
- Information is created every day in enormous
amounts - One solution summarization
- Abstracts promote current awareness
- save reading time
- facilitate selection
- facilitate literature searches
- aid in the preparation of reviews
- But what is an abstract??
4Introduction
- abstract
- brief but accurate representation of the
contents of a document - goal
- take an information source, extract the most
important content from it and present it to the
user in a condensed form and in a manner
sensitive to the users needs. - compression
- the amount of text to present or the length of
the summary to the length of the source.
5History
- The problem has been addressed since the
50sLuhn 58 - Numerous methods are currently being suggested
- Most methods still rely on 50s-70s algorithms
- Problem is still hard yet there are some
applications - MS Word,
- www.newsinessence.com by Drago Radevs research
group
6(No Transcript)
7MSWord AutoSummarize
8Applications
- Abstracts for Scientific and other articles
- News summarization (mostly multiple document
summarization) - Classification of articles and other written data
- Web pages for search engines
- Web access from PDAs, Cell phones
- Question answering and data gathering
9Types of Summaries
- Indicative vs Informative
- Informative a substitute for the entire document
- Indicative give an idea of what is there
- Background
- Does the reader have the needed prior knowledge?
- Expert reader vs Novice reader
- Query based or General
- Query based a form is being filled, answers
should be answered - General General purpose summarization
10Types of Summaries (input)
- Single document vs multiple documents
- Domain specific (chemistry) or general
- Genre specific (newspaper items) of general
11Types of Summaries (output)
- extract vs abstract
- Extracts representative paragraphs/sentences/
phrases/words, fragments of the original text - Abstracts a concise summary of the central
subjects in the document. - Research shows that sometimes readers prefer
Extracts! - language chosen for summarization
- format of the resulting summary
(table/paragraph/key words)
12Methods
- Quantitative heuristics, manually scored
- Machine-learning based statistical scoring
methods - Higher semantic/syntactic structures
- Network (graph) based methods
- Other methods (rhetorical analysis, lexical
chains, co-reference chains) - AI methods
13Quantitative Heuristics
- General method
- score each entity (sentence, word) combine
scores choose best sentence(s) - Scoring techniques
- Word frequencies throughout the text (Luhn 58)
- Position in the text (Edmunson 69, LinHovy 97)
- Title method (Edmunson 69)
- Cue phrases in sentences (Edmunson 69)
14Using Word Frequencies (Luhn 58)
- Very first work in automated summarization
- Assumptions
- Frequent words indicate the topic
- Frequent means with reference to the corpus
frequency - Clusters of frequent words indicate summarizing
sentence - Stemming based on similar prefix characters
- Very common words and very rare words are ignored
15Ranked Word Frequency
Zipfs curve
16Word frequencies (Luhn 58)
- Find consecutive sequences of high-weight
keywords - Allow a certain number of gaps of low-weight
terms - Sentences with highest sum of cluster weights are
chosen
17Position in the text (Edmunson 69)
- Claim Important sentences occur in specific
positions - lead-based summary
- inverse of position in document works well for
the news - Important information occurs in specific sections
of the document (introduction/conclusion)
18Title method (Edmunson 69)
- Claim title of document indicates its content
- Unless editors are being cute
- Not true for novels usually
- What about blogs ?
- words in title help find relevant content
- create a list of title words, remove stop words
- Use those as keywords in order to find important
sentences - (for example with Luhns methods)
19Cue phrases method (Edmunson 69)
- Claim Important sentences contain cue
words/indicative phrases - The main aim of the present paper is to
describe (IND) - The purpose of this article is to review (IND)
- In this report, we outline (IND)
- Our investigation has shown that (INF)
- Some words are considered bonus others stigma
- bonus comparatives, superlatives, conclusive
expressions, etc. - stigma negatives, pronouns, etc.
20Feature combination (Edmundson 69)
- Linear contribution of 4 features
- title, cue, keyword, position
- the weights are adjusted using training data with
any minimization technique - Evaluated on a corpus of 200 chemistry articles
- Length ranged from 100 to 3900 words
- Judges were told to extract 25 of the sentences,
to maximize coherence, minimize redundancy. - Features
- Position (sensitive to types of headings for
sections) - cue
- title
- keyword
- Best results obtained with
- cue title position
21Bayesian Classifier (Kupiec at el 95)
- Statistical learning method
- Feature set
- sentence length
- S gt 5
- fixed phrases
- 26 manually chosen
- paragraph
- sentence position in paragraph
- thematic words
- binary whether sentence is included in manual
extract - uppercase words
- not common acronyms
- Corpus
- 188 document summary pairs from scientific
journals
22Bayesian Classifier (Kupiec at el 95)
- Assuming statistical independence
23Bayesian Classifier (Kupiec at el 95)
- Each Probability is calculated empirically from a
corpus - Higher probability sentences are chosed to be in
the summary - Performance
- For 25 summaries, 84 precision
24Evaluation methods
- When a manual summary is available
- 1. choose a granularity (clause sentence
paragraph), - 2. create a similarity measure for that
granularity (word overlap multi-word overlap,
perfect match), - 3. measure the similarity of each unit in the new
to the most similar unit(s) - 4. measure Recall and Precision.
- Otherwise
- 1. Intrinsic how good is the summary as a
summary? - 2. Extrinsic how well does the summary help the
user?
25Intrinsic measures
- Intrinsic measures (glass-box) how good is the
summary as a summary? - Problem how do you measure the goodness of a
summary? - Studies compare to ideal (Edmundson, 69 Kupiec
et al., 95 Salton et al., 97 Marcu, 97) or
supply criteriafluency, informativeness,
coverage, etc. (Brandow et al., 95). - Summary evaluated on its own or comparing it with
the source - Is the text cohesive and coherent?
- Does it contain the main topics of the document?
- Are important topics omitted?
26Extrinsic measures
- (Black box) how well does the summary help a
user with a task? - Problem does summary quality correlate with
performance? - Studies GMAT tests (Morris et al., 92) news
analysis (Miike et al. 94) IR (Mani and
Bloedorn, 97) text categorization (SUMMAC 98
Sundheim, 98). - Evaluation in an specific task
- Can the summary be used instead of the document?
- Can the document be classified by reading the
summary? - Can we answer questions by reading the summary?
27The Document Understanding Conference (DUC)
- This is really the Text Summarization Competition
- Started in 2001
- Task and Evaluation (for 2001-2004)
- Various target sizes were used (10-400 words)
- Both single and multiple-document summaries
assessed - Summaries were manually judged for both content
and readability. - Each peer (human or automatic) summary was
compared against a single model summary - using SEE (http//www.isi.edu/ cyl/SEE/)
- estimates the percentage of information in the
model thatwas covered in the peer. - Also used ROUGE (Lin 04) in 2004
- Recall-Oriented Understudy for Gisting Evaluation
- Uses counts of n-gram overlap between candidate
and gold-standard summary, assumes fixed-length
summaries
28The Document Understanding Conference (DUC)
- Made a big change in 2005
- Extrinsic evaluation proposed but rejected (write
a natural disaster summary) - Instead a complex question-focused summarization
task that required summarizers to piece together
information from multiple documents to answer a
question or set of questions as posed in a DUC
topic. - Also indicated a desired granularity of
information
29The Document Understanding Conference (DUC)
- Evaluation metrics for new task
- Grammaticality
- Non-redundancy
- Referential clarity
- Focus
- Structure and Coherence
- Responsiveness (content-based evaluation)
- This was a difficult task to do well in.
30Lets make a summarizer!
- Each person (or pair) write code for one small
part of the problem, using Kupiec et als method. - Well combine the parts in class.
31Next Time
- More on Bayesian classification
- Other summarization approaches (Marcu paper)
- Multi-document summarization (Goldstein et al.
paper) - In-class summarizer!