Combining Multiple Models for Speech Information Retrieval - PowerPoint PPT Presentation

About This Presentation
Title:

Combining Multiple Models for Speech Information Retrieval

Description:

Terrier: Divergence from Randomness models (DFR). Two Query ... Combine the results of different retrieval strategies from SMART (14 runs) and Terrier (1 run) ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 15
Provided by: carolp5
Learn more at: http://www.lrec-conf.org
Category:

less

Transcript and Presenter's Notes

Title: Combining Multiple Models for Speech Information Retrieval


1
Combining Multiple Models for Speech Information
Retrieval
  • Muath Alzghool and Diana Inkpen

University of Ottawa Canada
2
Presentation Outline
  • Task Speech Information Retrieval.
  • Data Mallach collection (Oard et al, 2004) .
  • System description.
  • Model fusion.
  • Experiments using model fusion.
  • Results of the cross-language experiments.
  • Results of manual keywords and summaries.
  • Conclusion and future work.

3
The Mallach collection
  • Used in the Cross-Language Speech Retrieval
    (CLSR) task at Cross-Language Evaluation Forum
    (CLEF) 2007.
  • 8104 documents (segments) from 272 interviews
    with Holocaust survivors, totaling 589 hours of
    speech ASR transcripts with a word error rate
    of 25-38.
  • Additional metadata automatically-assigned
    keywords, manually-assigned keywords, and a
    manual 3-sentence summary.
  • A set of 63 training topics and 33 test topics,
    created in English from actual user requests and
    translated into Czech, German, French, and
    Spanish by native speakers.
  • Relevance judgments were generated standard
    pooling.

4
Segments
5
Example topic (English)
6
Example topic (French)
7
System Description
  • SMART Vector Space Model (VSM).
  • Terrier Divergence from Randomness models (DFR).
  • Two Query Expansion Methods
  • Based on thesaurus (novel technique).
  • Blind relevance feedback (12 terms from the top
    15 documents) based on Bose-Einstein 1 model
    (Bo1 from Terrier).
  • Model Fusion sum of normalized weighted
    similarity scores (novel way to compute weights).
  • Combined output of 7 machine translation tools.

8
Model Fusion
  • Combine the results of different retrieval
    strategies from SMART (14 runs) and Terrier (1
    run).
  • Each technique will retrieve different sets of
    relevant documents therefore combining the
    results could produce a better result than any of
    the individual techniques.

9
Experiments using Model Fusion
  • Applied the data fusion methods to 14 runs
    produced by SMART and one run produced by
    Terrier.
  • change is given with respect to the run
    providing better performance in each combination
    on the training data.
  • Model fusion helps to improve the performance
    (MAP and Recall score) on the test data.
  • Monolingual (English) 6.5 improvement (not
    statistically significant).
  • Cross-language experiments (French) 21.7
    improvement (significant).

10
Experiments using Model Fusion (MAP)
11
Experiments using Model Fusion (Recall)
12
Results of the cross-language experiments
  • The cross-language results for French are very
    close to Monolingual (English) on training data
    (the difference is not significant), but not on
    test data (the difference is significant).
  • The difference is significant between
    cross-language results for Spanish and
    Monolingual (English) on training data but not on
    test data (the difference is not significant).

13
Results of manual keywords and summaries
  • Experiments on manual keywords and manual
    summaries showed high improvements comparing to
    Auto-English.
  • Our results (for manual and automatic runs) are
    the highest to date on this data collection in
    CLEF/CLSR.

14
Conclusion and future work
  • Model fusion helps to improve the retrieval
    significantly for some experiments (Auto-French)
    and for other not significantly (Auto-English).
  • The idea of using multiple translations proved to
    be good (based on previous experiments).
  • Future work
  • we plan to investigate more methods of model
    fusion.
  • Removing or correcting some of the speech
    recognition errors in the ASR content words.
Write a Comment
User Comments (0)
About PowerShow.com