Extractive Summarization of Meeting Recordings - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Extractive Summarization of Meeting Recordings

Description:

Gabriel Murray, Steve Renals, Jean Carletta, 'Extractive Summarization of ... Jaime Carbonell, Jade Goldstein, 'The Use of MMR, Diversity-Based Reranking for ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 14
Provided by: YiT9
Category:

less

Transcript and Presenter's Notes

Title: Extractive Summarization of Meeting Recordings


1
Extractive Summarization of Meeting Recordings
  • Gabriel Murray, Steve Renals, Jean
    CarlettaCentre for Speech Technology
    ResearchUniversity of Edinburgh, Edinburgh EH8
    9LW, Scotland
  • Presenter Yi-Ting Chen

2
Reference
  • Gabriel Murray, Steve Renals, Jean Carletta,
    Extractive Summarization of Meeting Recordings,
    Eurospeech 2005.
  • Jaime Carbonell, Jade Goldstein, The Use of MMR,
    Diversity-Based Reranking for Reordering
    Documents and Producing Summaries, In
    Proceedings of ACM-SIGIR'98, Melbourne,
    Australia, August 1998.

3
Outline
  • Introduction
  • Summarization Approaches
  • Maximal Marginal Relevance (MMR)
  • Latent Semantic Analysis (LSA)
  • Feature-based Approaches
  • Experiments Setup
  • Results
  • Conclusion and Future Work

4
Introduction
  • Most work in speech summarization has been in the
    domain of broadcast news
  • It has been demonstrated that standard extractive
    text summarization techniques, using classifiers
    based on textual and structural features, work
    well on broadcast
  • Summarizing conversational speech is
    substantially different from text summarization
  • In this paper we investigate extractive
    summarization of multiparty meetings, using the
    ICSI Meeting Corpus
  • Experiments were carried out using both human
    transcriptions and the output of an automatic
    speech recognizer and the quality of the
    summaries were evaluated using ROUGE

5
Summarization Approaches(1/5)
  • Maximal Marginal Relevance (MMR)
  • MMR is based on the vector-space model of text
    retrieval, and is well-suited to query-based and
    multi-document summarization
  • In MMR, sentences are chosen according to a
    weighted combination of their relevance to a
    query and their redundancy with the sentences
    that have already been extracted
  • Both relevance and redundancy are measured using
    cosine similarity

6
Summarization Approaches(2/5)
  • Maximal Marginal Relevance (MMR)
  • Where D is the average document vector, Summ is
    the average vector from the set of sentences
    already selected
  • is annealed, so that relevance is emphasized
    when the summary is still short, and as the
    summary grows longer the emphasis is increasingly
    put on minimizing redundancy

7
Summarization Approaches(3/5)
  • Latent Semantic Analysis (LSA)
  • LSA is a vector-space approach which involves
    projection of the term-document matrix to a
    reduced dimension representation
  • The term-document matrix is decomposed as
    follows
  • Steinberger and Jezek have offered two strong
    criticisms of the Gong and Liu approach
  • Firstly, the method described above ties the
    dimensionality reduction to the desired summary
    length
  • Second, a sentence may score highly but never
    win in any dimension

8
Summarization Approaches(4/5)
  • Latent Semantic Analysis (LSA)
  • The same concerns were addressed, following the
    Gong and Liu approach, but rather than extracting
    the best sentence for each topic, the n best
    sentences are extracted
  • The number of sentences in the summary that will
    come from the first topic is determined by the
    percentage that the largest singular value
    represents out of the sum of all singular values,
    and so on for each topic

9
Summarization Approaches(5/5)
  • Feature-based Approaches
  • 1. Gaussian mixture models for the extracted and
    non-extracted classes
  • The prosodic features were the mean and standard
    deviation of F0, energy, and duration, all
    estimated and normalized at the word-level, then
    averaged over the utterance
  • The lexical features were both TFIDF-based the
    average and the maximum TFIDF score for the
    utterance
  • 2. The second feature-based approach created
    single LSA-based sentence scores

10
Experiments Setup
  • Human summaries of the ICSI Meeting corpus were
    used for evaluation and for training the
    feature-based approaches
  • An evaluation set of six meeting was defined and
    multiple human summaries were created for these
    meetings
  • The ROUGE evaluation approach were used

11
Results(2/2)
  • All of the machine summaries were 10 of the
    original document length
  • Of the four approaches to summarization used
    herein, the latent semantic analysis method
    performed the best on every meeting tested for
    every ROUGE measure with the exception of ROUGE-3
    and ROUGE-4

12
Results(2/2)
  • Valenza et al and Zechner and Waibel both
    observed that the WER of extracted summaries was
    significantly lower than the overall WER in the
    case of broadcast news

13
Conclusion and Future Work
  • Though the LSA method consistently performed the
    best, it was not a significant improvement over
    MMR and does not share some of the advantages of
    MMR
  • The focus in the immediate future will be put on
    greatly expanding the prosodic database and on
    building various types of classifiers for the
    feature-based approach
  • Finding a method of automatic utterance detection
Write a Comment
User Comments (0)
About PowerShow.com