National Institute of Standards and Technology - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

National Institute of Standards and Technology

Description:

National Institute of Standards and Technology. Information ... Given a text topic, retrieve ranked list of relevant excerpts from ... OKAPI ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 29
Provided by: markprz
Category:

less

Transcript and Presenter's Notes

Title: National Institute of Standards and Technology


1
2000 TREC-9 Spoken Document Retrieval
Track http//www.nist.gov/speech/sdr2000
John Garofolo, Jerome Lard, Ellen Voorhees
National Institute of Standards and
Technology Information Technology Laboratory
2
SDR 2000 - Overview
  • SDR 2000 Track Overview, changes for TREC-9
  • SDR Collection/Topics
  • Technical Approaches
  • Speech Recognition Metrics/Performance
  • Retrieval Metrics/Performance
  • Conclusions
  • Future

3
Spoken Document Retrieval (SDR)
  • Task
  • Given a text topic, retrieve ranked list of
    relevant excerpts from collection of recorded
    speech
  • Requires 2 core technologies
  • Speech Recognition
  • Information Retrieval
  • First step towards
  • multimedia information access.
  • Focus is on effect of recognition accuracy on
    retrieval performance
  • Domain Radio and TV Broadcast News

4
SDR Evaluation Approach
  • In the TREC tradition
  • Create realish but doable application task
  • Increase realism (and difficulty) each year
  • NIST creates
  • infrastructure test collection, queries, task
    definition, relevance judgements
  • task includes several different control
    conditions recognizer, boundaries, etc.
  • Sites submit
  • speech recognizer transcripts for benchmarking
    and sharing
  • rank-ordered retrieval lists for scoring

5
Past SDR Test Collections
6
Past SDR Evaluation Conditions
7
SDR 2000 - Changes from 1999
  • 2000
  • evaluated on whole shows including non-news
    segments
  • 50 ad-hoc topics in two forms short description
    and keyword
  • 1 baseline recognizer transcript sets (NIST/BBN
    B2 from 1999)
  • story boundaries unknown (SU) condition is
    required
  • recognition and use of non-lexical information
  • 1999
  • evaluated on hand-segmented news excerpts only
  • 49 ad-hoc-style topics/metrics
  • 2 baseline recognizer transcript sets (NIST/BBN)
  • story boundaries known (SK) focus and
    exploratory unknown (SU) conditions

8
SDR 2000 - Test Collection
  • Based on the LDC TDT-2 Corpus
  • 4 sources (TV ABC, CNN, Radio PRI, VOA)
  • February through June 1998 subset, 902 broadcasts
  • 557.5 hours, 21,754 stories, 6,755 filler and
    commercial segments (55 hours)
  • Reference transcripts
  • Human-annotated story boundaries
  • Full broadcast word transcription
  • News segments hand-transcribed (same as in 99)
  • Commercials and non-news filler transcribed via
    NIST ROVER applied to 3 automatic recognizer
    transcript sets
  • Word times provided by LIMSI forced alignment
  • Automatic recognition of non-lexical information
    (commercials, repeats, gender, bandwidth,
    non-speech, signal energy, and combinations)
    provided by CU

9
Test Variables
  • Collection
  • Reference (R1) - transcripts created by LDC human
    annotators
  • Baseline (B1) - transcripts created by NIST/BBN
    time-adaptive automatic recognizer
  • Speech (S1/S2) - transcripts created by sites
    own automatic recognizers
  • Cross-Recognizer (CR) - all contributed
    recognizers
  • Boundaries
  • Known (K) - Story boundaries provided by LDC
    annotators
  • Unknown (U) - Story boundaries unknown

10
Test Variables (contd)
  • Queries
  • Short (S) - 1 or 2-phrase description of
    information need
  • Terse (T) - keyword list
  • Non-Lexical Information
  • Default - Could make use of automatically-recogniz
    ed features
  • None (N) - no non-lexical information (control)
  • Recognition language models
  • Fixed (FLM) - Fixed language model/vocabulary
    predating test epoch
  • Rolling (RLM) - Time-adaptive language
    model/vocabulary using daily newswire texts

11
Test Conditions
  • Primary Conditions (may use non-lexical side
    info, but must run contrast below)
  • R1SU Reference Retrieval, short topics, using
    human-generated "perfect" transcripts without
    known story boundaries
  • R1TU Reference Retrieval, terse topics, using
    human-generated "perfect"
  • B1SU Baseline Retrieval, short topics, using
    provided recognizer transcripts without known
    story boundaries
  • B1TU Baseline Retrieval, terse topics, using
    provided recognizer transcripts without known
    story boundaries
  • S1SU Speech Retrieval, short topics, using own
    recognizer without known story boundaries
  • S1TU Baseline Retrieval,terse topics, using
    provided recognizer transcripts without known
    story boundaries
  • Optional Cross-Recognizer Condition (may use
    non-lexical side info, but must run contrast
    below)
  • CRSU-ltSYS_NAMEgt Cross-Recognizer Retrieval,
    short topics, using other participants'
    recognizer transcripts without known story
    boundaries
  • CRTU-ltSYS_NAMEgt Cross-Recognizer Retrieval,
    terse topics, using other participants'
    recognizer transcripts without known story
    boundaries
  • Conditional No Non-Lexical Information Condition
    (required contrast if non-lexical information is
    used in other conditions)
  • R1SUN Reference Retrieval, short topics, using
    human-generated "perfect" transcripts without
    known story boundaries, no non-lexical info
  • R1TUN Reference Retrieval, terse topics, using
    human-generated "perfect" transcripts without
    known story boundaries, no non-lexical info
  • B1SUN Baseline Retrieval, short topics, using
    provided recognizer transcripts without known
    story boundaries, no non-lexical info
  • B1TUN Baseline Retrieval, terse topics, using
    provided recognizer transcripts without known
    story boundaries, no non-lexical info
  • S1SUN Speech Retrieval, short topics, using own
    recognizer without known story boundaries, no
    non-lexical info
  • S1TUN Speech Retrieval, terse topics, using own
    recognizer without known story boundaries, no
    non-lexical info
  • S2SUN Speech Retrieval, short topics, using own
    second recognizer without known story boundaries,
    no non-lexical info
  • S2TUN Speech Retrieval, terse topics, using own
    second recognizer without known story boundaries,
    no non-lexical info

12
Test Topics
  • 50 topics developed by NIST Assessors using
    similar approach to TREC Ad-Hoc Task
  • Short and terse forms of topics were generated
  • Hard Topic 125 10 relevant stories
  • Short Provide information pertaining to
    security violations withinthe U. S. intelligence
    community. (.024 average MAP)
  • Terse U. S. intelligence violations (.019
    average MAP)
  • Medium Topic 143 8 relevant stories
  • Short How many Americans file for bankruptcy
    each year? (.505 avg MAP)
  • Terse Americans bankruptcy debts (.472 average
    MAP)
  • Easy Topic 127 11 relevant stories
  • Short Name some countries which permit their
    citizens to commit suicide with medical
    assistance. (.887 average MAP)
  • Terse assisted suicide (.938 average MAP)

13
Test Topic Relevance
14
Topic Difficulty
15
Participants

Full SDR (recognition and retrieval) Cambridge
University, UK LIMSI, France Sheffield
University, UK
16
Approaches for 2000
  • Automatic Speech Recognition
  • HMM, word-based - most
  • NN/HMM hybrid-based, Sheffield
  • Retrieval
  • OKAPI Probabilistic Model - all
  • Blind Relevance Feedback and parallel corpus BRF
    for query expansion
    - all
  • Story boundary unknown retrieval
  • passage windowing, retrieval and merging - all
  • Use of automatically-recognized non-lexical
    features
  • repeat and commercial detection - CU

17
ASR Metrics
  • Traditional ASR Metric
  • Word Error Rate (WER) and Mean Story Word Error
    Rate (SWER) using SCLITE and LDC ref transcripts
  • WER word insertions word deletions word
    substitutions
  • total words in reference
  • LDC created 2 Hub-4 compliant 10-hour subsets for
    ASR scoring and analyses (LDC-SDR-99 and
    LDC-SDR-2000)
  • Note that there is a 10.3 WER in the collection
    human (closed caption) transcripts

Note SDR recognition is not directly comparable
to Hub-4 benchmarks due to transcript quality,
test set selection method, and word mapping
method used in scoring
18
ASR Performance
19
IR Metrics
  • Traditional TREC ad-hoc Metric
  • Mean Average Precision (MAP) using TREC_EVAL
  • Created assessment pools for each topic using top
    100 of all retrieval runs
  • Mean pool size 596 (2.1 of all segments)
  • Min pool size 209
  • Max pool size 1309
  • NIST assessors created reference relevance
    assessments from topic pools
  • Somewhat artificial for boundary unknown
    conditions

20
Story Boundaries Known Condition
  • Retrieval using pre-segmented news stories
  • systems given index of story boundaries for
    recognition with IDs for retrieval
  • excluded non-news segments
  • stories are treated as documents
  • systems produce rank-ordered list of Story IDs
  • document-based scoring
  • score as in other TREC Ad Hoc tests using
    TREC_EVAL

21
Story Boundaries Known Retrieval Condition
22
Unknown Story Boundary Condition
  • Retrieval using continuous speech stream
  • systems process entire broadcasts for ASR and
    retrieval with no provided segmentation
  • systems output a single time marker for each
    relevant excerpt to indicate topical passages
  • this task does NOT attempt to determine topic
    boundaries
  • time-based scoring
  • map to a story ID (dummy ID for retrieved
    non-stories and duplicates)
  • score as usual using TREC_EVAL
  • penalizes for duplicate retrieved stories
  • story-based scoring somewhat artificial but
    expedient

23
Story Boundaries Unknown Retrieval Condition
24
SDR-2000 Cross-Recognizer Results
25
Conclusions
  • ad hoc retrieval in broadcast news domain appears
    to be a solved problem
  • systems perform well at finding relevant passages
    in transcripts produced by a variety of
    recognizers on full unsegmented news broadcasts
  • performance on own recognizer comparable to human
    reference
  • just beginning to investigate use of non-lexical
    information
  • Caveat Emptor
  • ASR may still pose serious problems for Question
    Answering domain where content errors are fatal

26
Future for Multi-Media Retrieval?
  • SDR Track will be sunset
  • Other opportunities
  • TREC
  • Question Answering Track
  • New Video Retrieval Track
  • CLEF
  • Cross-language SDR
  • TDT Project

27
TREC-9 SDR Results, Primary Conditions
28
TREC-9 SDR Results, Cross Recognizer Conditions
Write a Comment
User Comments (0)
About PowerShow.com