Summarization as CategoryBased Text Reduction - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Summarization as CategoryBased Text Reduction

Description:

christian.winkler_at_uni-klu.ac.at. Automatic Summarization. Usually a Non Linguistic Approach ... Assigns POS Tags to English Text. Appropriate Tags Based on ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 14
Provided by: gnther7
Category:

less

Transcript and Presenter's Notes

Title: Summarization as CategoryBased Text Reduction


1
Summarization as Category-Based Text Reduction
  • Günther Fliedl Christian Winkler
  • fliedl_at_ifit.uni-klu.ac.at
  • christian.winkler_at_uni-klu.ac.at

2
Automatic Summarization
  • Usually a Non Linguistic Approach
  • Keyword Listing
  • Paragraph Minimizing
  • Collocation Extraction

3
Symbolic Methods ofMinimizing Textual Units
(Linguistically Motivated)
  • Argument Structure Filtering
  • Phrase Chunking
  • Rudimentary Types of Lemmatizing

4
Our proposalPOS-Driven
  • Linguistically Motivated by
  • POS Tagging
  • Listing and Filtering of Relevant Words
  • XML-Based Text Reduction

5
Features of the Tagging System
  • Probability-Based, Corpus-Trained
  • Assigns POS Tags to English Text
  • Appropriate Tags Based on Conditional
    Probabilities
  • Extracts as Many Nouns and Noun Phrases as
    Possible

6
Used Methods (1)
  • add_tags TEXT
  • Examine the string provided and return it fully
    tagged ( XML style )
  • get_words TEXT
  • Given a text string, return as many nouns and
    noun phrases as possible. Applies add_tags and
    involves three stages

7
Used Methods (2)
  • get_readable TEXT
  • Return an easy-on-the-eyes tagged version of a
    text string. Applies add_tags and reformats to be
    easier to read.
  • get_sentences TEXT
  • Returns an anonymous array of sentences (without
    POS tags) from a text.

8
Used Methods (3)
  • get_proper_nouns TAGGED_TEXT
  • Given a POS-tagged text, this method returns a
    hash of all proper nouns and their occurrence
    frequencies. The method is greedy and will return
    multi-word phrases, if possible, so it would find
    Linguistic Data Consortium'' as a single unit,
    rather than as three individual proper nouns.
    This method does not stem the found words.
  • get_nouns TAGGED_TEXT
  • Given a POS-tagged text, this method returns all
    nouns and their occurrence frequencies.

9
Used Methods (4)
  • get_max_noun_phrases TAGGED_TEXT
  • Given a POS-tagged text, this method returns only
    the maximal noun phrases. May be called directly,
    but is also used by get_noun_phrases
  • get_noun_phrases TAGGED_TEXT
  • Similar to get_words, but requires a POS-tagged
    text as an argument.

10
(No Transcript)
11
Online Tagger
The Online Tagger
http//nlp.ifit.uni-klu.ac.at/NLP/
http//search.cpan.org/src/ACOBURN/Lingua-EN-Tagge
r-0.13/README
http//nlp.ifit.uni-klu.ac.at/NLP/index.jsp?nameE
N-Tagger
12
Filtering
13
  • QUESTIONS?
Write a Comment
User Comments (0)
About PowerShow.com