Speech recognition in MUMIS - PowerPoint PPT Presentation

About This Presentation
Title:

Speech recognition in MUMIS

Description:

Speech recognition in MUMIS Judith Kessens, Mirjam Wester & Helmer Strik Manual transcriptions Transcriptions made by SPEX: orthographic transcriptions transcriptions ... – PowerPoint PPT presentation

Number of Views:215
Avg rating:3.0/5.0
Slides: 23
Provided by: cog82
Category:

less

Transcript and Presenter's Notes

Title: Speech recognition in MUMIS


1
Speech recognition in MUMIS
  • Judith Kessens, Mirjam Wester
  • Helmer Strik

2
Manual transcriptions
  • Transcriptions made by SPEX
  • orthographic transcriptions
  • transcriptions on chunk level (2-3 sec.)
  • Formats
  • .Textgrid ? praat
  • xml-derivatives
  • .pri no time information
  • .skp time information

3
Manual transcriptions
  • Total amount of transcribed matches on ftp-site
    (including the demo matches)
  • Dutch 6 matches
  • German 21 matches
  • English 3 matches
  • Extensions
  • Dutch (_N), German (_G), English (_E)

4
Automatic speech recognition
  • Acoustic preprocessing
  • Acoustic signal ? features
  • 2. Speech recognition
  • Acoustic models
  • Language models
  • Lexicon

5
Automatic transcriptions
  • Problem of recorded data
  • Commentaries and stadium noise are mixed
  • Very high noise levels
  • ? Recognition of such extreme noisy data is very
    difficult

6
Examples of data
  • Yug-Ned match
  • Dutch
  • English
  • German

op _t ogenblik wordt in dit stadion de
opstelling voorgelezen
and they wanna make the change before the corner
und die beiden Tore die die Hollaender bekommen
hat haben
7
Examples of data
  • Eng-Dld match
  • Dutch
  • English
  • German

geeft nu een vrije trap in _t voordeel van Ince
and phil neville had to really make about three
yards to stop ltdreislerugt pulling it down and
playing it
wurde von allen englischen Zeitungen aus der
Mannschaft
8
Evaluation of aut. transcriptions
insertionsdeletionssubstitutions number of words
WER()
? WER can be larger than 100 !
9
WERs (all words)
Dutch English German
Yug-Ned 84.5 84.5 77.4
Eng-Dld 83.2 83.3 90.8
10
WERs (player names)
Dutch English German
Yug-Ned names 84.5 53.0 84.5 48.2 77.4 40.9
Eng-Dld names 83.2 55.0 83.3 56.2 90.8 77.4
11
WERs versus SNR
Dutch English German
Yug-Ned SNR 84.5 9 84.5 12 77.4 19
Eng-Dld SNR 83.2 8 83.3 11 90.8 7
12
Automatic transcriptions
  • The language model (LM) and lexicon (lex) are
    adapted to a specific match
  • Start with a general LM and lex
  • Add player names of the specific match
  • Expand the general LM and lex when more data is
    available

13
WERs for various amounts of data
14
Oracle experiments - ICLSP02
  • Due to limited amount of material we started off
    with oracle experiments
  • Language models are trained on target match
  • Acoustic models are trained on part of target
    match or other match
  • ? Much lower WERs

15
Summary of results
  • Acoustic model training
  • Leaving out non-speech chunks does not hurt
    recognition performance
  • Using more training data is benificial, but more
    important
  • The SNRs of the training and test data should be
    matched

16
Summary of results
  • WERs are SNR-dependent

(tested on Yug-Ned match)
17
Summary of results
Split words into categories, i.e. function words,
content words and football players names WER
function words gt WER content words gt WER names
(tested on Yug-Ned match)
18
Summary of results
  • Noise reduction tool (FTNR)? small improvement

19
Ongoing work
  • Techniques to lower WERs
  • Tuning of the generic language model
  • Defining different classes
  • Reduction of OOV words in lexicon and in the
    language model (using more material)
  • Speaker Adaptation in HTK
  • (note all other experiments are being carried
    out using Phicos)

20
Ongoing work
  • Noise robustness
  • Extension of the acoustic models by using double
    deltas.
  • Histogram Normalization and FTNR.
  • SNR dependent acoustic models.

21
Recommendations
  • Acoustic modeling
  • Record commentaries and stadium noise separately
  • Speaker adaptation
  • - Transcribe characteristics of commentator
  • - Collect more speech data of commentator

22
Recommendations
  • Lexicon and language modeling
  • Collect orthographic transcriptions of spoken
    material, instead of written material
  • Subtitles
  • Close captions
Write a Comment
User Comments (0)
About PowerShow.com