National Institute of Standards and Technology - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

National Institute of Standards and Technology

Description:

National Institute of Standards and Technology. Information ... Given a text topic, retrieve ranked list of relevant excerpts from ... OKAPI ... – PowerPoint PPT presentation

Number of Views:51

Avg rating:3.0/5.0

Slides: 29

Provided by: markprz

Category:

more less

Transcript and Presenter's Notes

Title: National Institute of Standards and Technology

1
2000 TREC-9 Spoken Document Retrieval
Track http//www.nist.gov/speech/sdr2000
John Garofolo, Jerome Lard, Ellen Voorhees
National Institute of Standards and
Technology Information Technology Laboratory
2
SDR 2000 - Overview

SDR 2000 Track Overview, changes for TREC-9
SDR Collection/Topics
Technical Approaches
Speech Recognition Metrics/Performance
Retrieval Metrics/Performance
Conclusions
Future

3
Spoken Document Retrieval (SDR)

Task
Given a text topic, retrieve ranked list of
relevant excerpts from collection of recorded
speech
Requires 2 core technologies
Speech Recognition
Information Retrieval

First step towards
multimedia information access.
Focus is on effect of recognition accuracy on
retrieval performance
Domain Radio and TV Broadcast News

4
SDR Evaluation Approach

In the TREC tradition
Create realish but doable application task
Increase realism (and difficulty) each year
NIST creates
infrastructure test collection, queries, task
definition, relevance judgements
task includes several different control
conditions recognizer, boundaries, etc.
Sites submit
speech recognizer transcripts for benchmarking
and sharing
rank-ordered retrieval lists for scoring

5
Past SDR Test Collections
6
Past SDR Evaluation Conditions
7
SDR 2000 - Changes from 1999

2000
evaluated on whole shows including non-news
segments
50 ad-hoc topics in two forms short description
and keyword
1 baseline recognizer transcript sets (NIST/BBN
B2 from 1999)
story boundaries unknown (SU) condition is
required
recognition and use of non-lexical information

1999
evaluated on hand-segmented news excerpts only
49 ad-hoc-style topics/metrics
2 baseline recognizer transcript sets (NIST/BBN)
story boundaries known (SK) focus and
exploratory unknown (SU) conditions

8
SDR 2000 - Test Collection

Based on the LDC TDT-2 Corpus
4 sources (TV ABC, CNN, Radio PRI, VOA)
February through June 1998 subset, 902 broadcasts
557.5 hours, 21,754 stories, 6,755 filler and
commercial segments (55 hours)
Reference transcripts
Human-annotated story boundaries
Full broadcast word transcription
News segments hand-transcribed (same as in 99)
Commercials and non-news filler transcribed via
NIST ROVER applied to 3 automatic recognizer
transcript sets
Word times provided by LIMSI forced alignment
Automatic recognition of non-lexical information
(commercials, repeats, gender, bandwidth,
non-speech, signal energy, and combinations)
provided by CU

9
Test Variables

Collection
Reference (R1) - transcripts created by LDC human
annotators
Baseline (B1) - transcripts created by NIST/BBN
time-adaptive automatic recognizer
Speech (S1/S2) - transcripts created by sites
own automatic recognizers
Cross-Recognizer (CR) - all contributed
recognizers
Boundaries
Known (K) - Story boundaries provided by LDC
annotators
Unknown (U) - Story boundaries unknown

10
Test Variables (contd)

Queries
Short (S) - 1 or 2-phrase description of
information need
Terse (T) - keyword list
Non-Lexical Information
Default - Could make use of automatically-recogniz
ed features
None (N) - no non-lexical information (control)
Recognition language models
Fixed (FLM) - Fixed language model/vocabulary
predating test epoch
Rolling (RLM) - Time-adaptive language
model/vocabulary using daily newswire texts

11
Test Conditions

Primary Conditions (may use non-lexical side
info, but must run contrast below)
R1SU Reference Retrieval, short topics, using
human-generated "perfect" transcripts without
known story boundaries
R1TU Reference Retrieval, terse topics, using
human-generated "perfect"
B1SU Baseline Retrieval, short topics, using
provided recognizer transcripts without known
story boundaries
B1TU Baseline Retrieval, terse topics, using
provided recognizer transcripts without known
story boundaries
S1SU Speech Retrieval, short topics, using own
recognizer without known story boundaries
S1TU Baseline Retrieval,terse topics, using
provided recognizer transcripts without known
story boundaries
Optional Cross-Recognizer Condition (may use
non-lexical side info, but must run contrast
below)
CRSU-ltSYS_NAMEgt Cross-Recognizer Retrieval,
short topics, using other participants'
recognizer transcripts without known story
boundaries
CRTU-ltSYS_NAMEgt Cross-Recognizer Retrieval,
terse topics, using other participants'
recognizer transcripts without known story
boundaries
Conditional No Non-Lexical Information Condition
(required contrast if non-lexical information is
used in other conditions)
R1SUN Reference Retrieval, short topics, using
human-generated "perfect" transcripts without
known story boundaries, no non-lexical info
R1TUN Reference Retrieval, terse topics, using
human-generated "perfect" transcripts without
known story boundaries, no non-lexical info
B1SUN Baseline Retrieval, short topics, using
provided recognizer transcripts without known
story boundaries, no non-lexical info
B1TUN Baseline Retrieval, terse topics, using
provided recognizer transcripts without known
story boundaries, no non-lexical info
S1SUN Speech Retrieval, short topics, using own
recognizer without known story boundaries, no
non-lexical info
S1TUN Speech Retrieval, terse topics, using own
recognizer without known story boundaries, no
non-lexical info
S2SUN Speech Retrieval, short topics, using own
second recognizer without known story boundaries,
no non-lexical info
S2TUN Speech Retrieval, terse topics, using own
second recognizer without known story boundaries,
no non-lexical info

12
Test Topics

50 topics developed by NIST Assessors using
similar approach to TREC Ad-Hoc Task
Short and terse forms of topics were generated
Hard Topic 125 10 relevant stories
Short Provide information pertaining to
security violations withinthe U. S. intelligence
community. (.024 average MAP)
Terse U. S. intelligence violations (.019
average MAP)
Medium Topic 143 8 relevant stories
Short How many Americans file for bankruptcy
each year? (.505 avg MAP)
Terse Americans bankruptcy debts (.472 average
MAP)
Easy Topic 127 11 relevant stories
Short Name some countries which permit their
citizens to commit suicide with medical
assistance. (.887 average MAP)
Terse assisted suicide (.938 average MAP)

13
Test Topic Relevance
14
Topic Difficulty
15
Participants

Full SDR (recognition and retrieval) Cambridge
University, UK LIMSI, France Sheffield
University, UK
16
Approaches for 2000

Automatic Speech Recognition
HMM, word-based - most
NN/HMM hybrid-based, Sheffield
Retrieval
OKAPI Probabilistic Model - all
Blind Relevance Feedback and parallel corpus BRF
for query expansion
- all
Story boundary unknown retrieval
passage windowing, retrieval and merging - all
Use of automatically-recognized non-lexical
features
repeat and commercial detection - CU

17
ASR Metrics

Traditional ASR Metric
Word Error Rate (WER) and Mean Story Word Error
Rate (SWER) using SCLITE and LDC ref transcripts
WER word insertions word deletions word
substitutions
total words in reference
LDC created 2 Hub-4 compliant 10-hour subsets for
ASR scoring and analyses (LDC-SDR-99 and
LDC-SDR-2000)
Note that there is a 10.3 WER in the collection
human (closed caption) transcripts

Note SDR recognition is not directly comparable
to Hub-4 benchmarks due to transcript quality,
test set selection method, and word mapping
method used in scoring
18
ASR Performance
19
IR Metrics

Traditional TREC ad-hoc Metric
Mean Average Precision (MAP) using TREC_EVAL
Created assessment pools for each topic using top
100 of all retrieval runs
Mean pool size 596 (2.1 of all segments)
Min pool size 209
Max pool size 1309
NIST assessors created reference relevance
assessments from topic pools
Somewhat artificial for boundary unknown
conditions

20
Story Boundaries Known Condition

Retrieval using pre-segmented news stories
systems given index of story boundaries for
recognition with IDs for retrieval
excluded non-news segments
stories are treated as documents
systems produce rank-ordered list of Story IDs
document-based scoring
score as in other TREC Ad Hoc tests using
TREC_EVAL

21
Story Boundaries Known Retrieval Condition
22
Unknown Story Boundary Condition

Retrieval using continuous speech stream
systems process entire broadcasts for ASR and
retrieval with no provided segmentation
systems output a single time marker for each
relevant excerpt to indicate topical passages
this task does NOT attempt to determine topic
boundaries
time-based scoring
map to a story ID (dummy ID for retrieved
non-stories and duplicates)
score as usual using TREC_EVAL
penalizes for duplicate retrieved stories
story-based scoring somewhat artificial but
expedient

23
Story Boundaries Unknown Retrieval Condition
24
SDR-2000 Cross-Recognizer Results
25
Conclusions

ad hoc retrieval in broadcast news domain appears
to be a solved problem
systems perform well at finding relevant passages
in transcripts produced by a variety of
recognizers on full unsegmented news broadcasts
performance on own recognizer comparable to human
reference
just beginning to investigate use of non-lexical
information
Caveat Emptor
ASR may still pose serious problems for Question
Answering domain where content errors are fatal

26
Future for Multi-Media Retrieval?