Speech%20Recognition%20Introduction%20I - PowerPoint PPT Presentation

About This Presentation
Title:

Speech%20Recognition%20Introduction%20I

Description:

Speech Recognition Introduction I E.M. Bakker Speech Recognition Some Applications An Overview General Architecture Speech Production Speech Perception Speech ... – PowerPoint PPT presentation

Number of Views:308
Avg rating:3.0/5.0
Slides: 25
Provided by: Bak130
Category:

less

Transcript and Presenter's Notes

Title: Speech%20Recognition%20Introduction%20I


1
Speech RecognitionIntroduction I
  • E.M. Bakker

2
Speech Recognition
  • Some Applications
  • An Overview
  • General Architecture
  • Speech Production
  • Speech Perception

3
Speech Recognition
  • Goal Automatically extract the string of words
    spoken from the speech signal

4
Speech Recognition
  • Goal Automatically extract the string of words
    spoken from the speech signal
  • How is SPEECH produced?
  • Characteristics of
  • Acoustic Signal

5
Speech Recognition
  • Goal Automatically extract the string of words
    spoken from the speech signal

How is SPEECH perceived? gt Important Features
6
Speech Recognition
  • Goal Automatically extract the string of words
    spoken from the speech signal

What LANGUAGE is spoken? gt Language Model
7
Speech Recognition
  • Goal Automatically extract the string of words
    spoken from the speech signal

What is in the BOX?
8
Important Componentsof General SR Architecture
  • Speech Signals
  • Signal Processing Functions
  • Parameterization
  • Acoustic Modeling (Learning Phase)
  • Language Modeling (Learning Phase)
  • Search Algorithms and Data Structures
  • Evaluation

9
Recognition ArchitecturesA Communication
Theoretic Approach
Message Source
Linguistic Channel
Articulatory Channel
Acoustic Channel
Features
Observable Message
Words
Sounds
Speech Recognition Problem P(WA), where A
is acoustic signal, W words
spoken
Objective minimize the word error
rate Approach maximize P(WA) during training
  • Bayesian formulation for speech recognition
  • P(WA) P(AW) P(W) / P(A), A is
    acoustic signal, W words spoken
  • Components
  • P(AW) acoustic model (hidden Markov models,
    mixtures)
  • P(W) language model (statistical, finite
    state networks, etc.)
  • The language model typically predicts a small set
    of next words based on
  • knowledge of a finite number of previous words
    (N-grams).

10
Recognition Architectures
Input Speech
Language Model P(W)
11
ASR Architecture
Evaluators
Feature Extraction
Recognition Searching Strategies
Speech Database, I/O
HMM Initialisation and Training
Common BaseClasses Configuration and Specification
Language Models
12
Signal ProcessingFunctionality
  • Acoustic Transducers
  • Sampling and Resampling
  • Temporal Analysis
  • Frequency Domain Analysis
  • Ceps-tral Analysis
  • Linear Prediction and LP-Based Representations
  • Spectral Normalization

13
Acoustic Modeling Feature Extraction
Fourier Transform
Input Speech
Cepstral Analysis
Perceptual Weighting
Time Derivative
Time Derivative
Delta Energy Delta Cepstrum
Delta-Delta Energy Delta-Delta Cepstrum
Energy Mel-Spaced Cepstrum
14
Acoustic Modeling
  • Dynamic Programming
  • Markov Models
  • Parameter Estimation
  • HMM Training
  • Continuous Mixtures
  • Decision Trees
  • Limitations and Practical Issues of HMM

15
Acoustic ModelingHidden Markov Models
  • Acoustic models encode the temporal evolution of
    the features (spectrum).
  • Gaussian mixture distributions are used to
    account for variations in speaker, accent, and
    pronunciation.
  • Phonetic model topologies are simple
    left-to-right structures.
  • Skip states (time-warping) and multiple paths
    (alternate pronunciations) are also common
    features of models.
  • Sharing model parameters is a common strategy to
    reduce complexity.

16
Acoustic Modeling Parameter Estimation
  • Closed-loop data-driven modeling supervised from
    a word-level transcription.
  • The expectation/maximization (EM) algorithm is
    used to improve our parameter estimates.
  • Computationally efficient training algorithms
    (Forward-Backward) have been crucial.
  • Batch mode parameter updates are typically
    preferred.
  • Decision trees are used to optimize
    parameter-sharing, system complexity, and the
    use of additional linguistic knowledge.

17
Language Modeling
  • Formal Language Theory
  • Context-Free Grammars
  • N-Gram Models and Complexity
  • Smoothing

18
Language Modeling
19
Language Modeling N-Grams
20
LM Integration of Natural Language
21
Search Algorithms and Data Structures
  • Basic Search Algorithms
  • Time Synchronous Search
  • Stack Decoding
  • Lexical Trees
  • Efficient Trees

22
Dynamic Programming-Based Search
23
Recognition Architectures
Input Speech
Language Model P(W)
24
Speech Recognition
  • Goal Automatically extract the string of words
    spoken from the speech signal

How is SPEECH produced?
Write a Comment
User Comments (0)
About PowerShow.com