Potential team members to date: - PowerPoint PPT Presentation

1 / 5
About This Presentation
Title:

Potential team members to date:

Description:

Kate Saenko. November 12, 2005. Dynamic Bayesian network implementation: ... for acoustic model, using only articulatory 'ground truth' and acoustics ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 6
Provided by: kliv8
Category:

less

Transcript and Presenter's Notes

Title: Potential team members to date:


1
Articulatory Feature-based Speech RecognitionA
Proposal for the 2006 JHU Summer Workshop on
Language Engineering
November 12, 2005
  • Potential team members to date
  • Karen Livescu (presenter)
  • Simon King
  • Florian Metze
  • Jeff Bilmes

Mark Hasegawa-Johnson Ozgur Cetin Kate Saenko
2
Dynamic Bayesian network implementation The
context-independent case
Example DBN with 3 features
3
Combination of articulatory phonology
coarticulation modeling with IPA feature-based
acoustic modeling
(deterministic mapping)
  • Suggests a potential work plan
  • 1st half of workshop Sub-teams work in parallel
    on
  • (1) Set of features and classifiers for acoustic
    model, using only articulatory ground truth
    and acoustics
  • (2) Aspects of hidden structure (asynchrony,
    substitutions, context dependency), using only
    articulatory ground truth and words
  • 2nd half of workshop Integrate most successful
    methods from 1st half

4
Resources
  • Tools
  • GMTK
  • HTK
  • Intel AVCSR toolkit
  • Data
  • Audio-only
  • Svitchboard (CSTR Edinburgh) Small-vocab,
    continuous, conversational
  • PhoneBook Medium-vocab, isolated-word, read
  • (Switchboard rescoring? LVCSR)
  • Audio-visual
  • AVTIMIT (MIT) Medium-vocab, continuous, read,
    added noise
  • Digit strings database (MIT) Continuous, read,
    naturalistic setting (noise and video background)
  • AVICAR, UIUC
  • Articulatory measurements
  • X-ray microbeam database (U. Wisconsin) Many
    speakers, large-vocab, isolated-word and
    continuous
  • MOCHA (QMUC, Edinburgh) Few speakers,
    medium-vocab, continuous
  • Others?
  • Manual transcriptions ICSI Berkeley Switchboard
    transcription project

5
Question to address (soon)
  • Audio-only, audio-visual only, or both?
  • Audio-only
  • Better understood by current team members
  • Has more spontaneous speech data
  • Audio-visual
  • Potentially, many more interesting phenomena in
    read data
  • Visual observations more closely tied to
    articulatory features
  • Smaller tasks ? faster turnaround time ? higher
    impact?
  • Can we reliably decouple investigation of
    acoustic modeling and pronunciation modeling?
  • Evaluation via measures other than word error
    rate
  • Forced alignments
  • Articulatory tracking
  • Reasonableness of model parameters
  • (Multi-style ASR Train on slow, test on fast?)
Write a Comment
User Comments (0)
About PowerShow.com