RESEARCH PRESENTATION: - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

RESEARCH PRESENTATION:

Description:

... E. Hart, and D.G. Stork, Pattern Classification, Second Edition, Wiley ... 3, 2000 (supporting material available at http://rii.ricoh.com/~stork/DHS.html) ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 35
Provided by: jennifer194
Category:

less

Transcript and Presenter's Notes

Title: RESEARCH PRESENTATION:


1
RESEARCH PRESENTATION
An overview of work done so far in 2006
Madhulika Pannuri Intelligent Electronic
Systems Human and Systems Engineering Center for
Advanced Vehicular Systems
2
  • Abstract
  • ABSTRACT
  • Language Model ABNF
  • The LanguageModel ABNF reads in a set of
    productions in the form of ABNF and converts them
    to BNF. These productions are passes to
    LanguageModel BNF in the form they are received.
  • Optimum time Delay estimation
  • Computes optimum time delay for
    re-constructing phase space. Auto Mutual
    Information is used for finding optimum time
    delay.
  • Dimension Estimation
  • The dimension of an attractor is a measure of
    its geometric scaling properties and is the most
    basic property of an attractor. Though there are
    many dimensions, we will be implementing
    Correlation Dimension and Lyapunov Dimension.

3
Language Model ABNF
  • Why conversion from ABNF to BNF?
  • ABNF balances compactness and simplicity, with
    reasonable representational power.
  • Differences The differences between BNF and ABNF
    involve naming rules, repetition, alternatives,
    order-independence and value ranges.
  • Removing the meta symbols makes it easy to
    convert it to finite state machines (IHD).
  • Problem Converting any given ABNF to BNF is
    impossible. So, generalize the algorithm so that
    most of the expressions can be converted.
  • The steps for conversion involve removing meta
    symbols one after the other. There were
    complications like multiple nesting.
  • Ex ((a, n) (b s))

4
Language Model ABNF
  • Concatenation
  • Removing concatenation requires two new
    rules with new names to be introduced. The last
    two productions resulting from conversion are not
    valid CFGs but shorter than the original.
  • Alternation
  • The production is replaced with a pair of
    productions. They may or may not be legal CFGs.
  • Kleene Star
  • New variable is introduced and the production
    is replaced. Gives rise to epsilon transition.
  • Kleene Plus
  • Similar to Kleene Plus. No null transition.

5
Optimum time delay estimation
  • Why calculating time delay ?
  • If we have a trajectory from a chaotic system and
    we only have data from one of the system
    variables, there's a neat theorem that says you
    can reconstruct a copy of the attractor of the
    system by lagging the time series to embed it in
    more dimensions.
  • In other words, if we have a point F( x, y, z, t)
    which is along some strange attractor, and we can
    only measure F(z,t), we can plot F (z, z N, z
    2N, t), and the resulting object will be
    topologically identical to the original attractor
    ..!!!
  • Method of time delays provides a relatively
    simple way of constructing an attractor from
    single experimental time series.
  • So, how do we choose time delay ?
  • Choosing optimum time delay is not trivial as the
    dynamical properties of the reconstructed
    attractor are to be amenable for subsequent
    analysis.
  • For an infinite amount of noise free data, we are
    free to choose arbitrarily, the time delay.

6
Optimum time delay estimation
  • For smaller values of tau, s(t) and s(ttau) are
    very close to each other in numerical value, and
    hence they are not independent of each other.
  • For large values of tau, s(t) and s(t tau) are
    completely independent of each other, and any
    connection between them in the case of chaotic
    attractor is random because of butterfly effect.
  • We need a criterion for intermediate choice that
    is large enough so that s(t) and s(ttau) are
    independent but not so large that s(t) and
    s(ttau) are completely independent in
    statistical sense.
  • Time delay is a multiple of sampling time (data
    is available at these times only).
  • There are four common methods for determining an
    optimum time delay
  • Visual Inspection of reconstructed attractors
  • The simplest way to choose tau.
  • Consider successively larger values of tau and
    then visually inspect the phase portrait of the
    resulting attractor.
  • Choosing the tau value which appears to give the
    most spread out attractor.
  • Disadvantage
  • This produces reasonable results with relatively
    simple systems.

7
Methods to estimate optimum time delay
  • 2. Dominant period relationship
  • We use the property that time delay is
    one quarter of the dominant period.
  • Advantage
  • Quick and easy method for determining tau.
  • Disadvantages
  • Can be used for low dimensional systems .
  • Many complex systems do not possess a single
    dominant frequency.
  • The Autocorrelation function
  • The auto correlation function C, compares two
    data points in the time series separated by delay
    tau, and is defined as,
  • The delay for the attractor reconstruction, tau,
    is then taken at a specific threshold value of C.
  • The behavior of C is inconsistent .

8
Feature Extraction in Speech Recognition
  • 4. Minimum auto mutual Information method
  • The Mutual Information is given by
  • When the mutual information is minimum, the
    attractor is as spread out as much as possible.
    This condition for the choice of delay time is
    known as minimum mutual information criterion.
  • Practical Implementation
  • To calculate the mutual information, the 2D
    reconstruction of an attractor is partitioned
    into a grid of Nc columns and Nr rows.
  • Discrete probability density functions for X(i)
    and X(itau) are generated by summing the data
    points in each row and column of the grid
    respectively and dividing by the total number of
    attractor points.
  • The joint probability of occurance P(k, l) of
    the attractor in any particular box is calculated
    by counting the number of discrete points in the
    box and dividing by the total number of points on
    the attractor trajectory.
  • The value of tau which gives the first minimum is
    the attractor reconstruction delay.

9
Method used to calculate Mutual Information
10
Ami plots for sine using IFC
11
Lorenz time series
12
Ami for lorentz time series (IFC)
13
ami plots with white noise added
14
Attractor variation with tau value
15
Doubly Stochastic Systems
  • The 1-coin model is observable because the output
    sequence can be mapped to a specific sequence of
    state transitions
  • The remaining models are hidden because the
    underlying state sequence cannot be directly
    inferred from the output sequence

16
Discrete Markov Models
17
Markov Models Are Computationally Simple
18
Training Recipes Are Complex And Iterative
19
Bootstrapping Is Key In Parameter Reestimation
20
The Expectation-Maximization Algorithm (EM)
21
Controlling Model Complexity
22
Data-Driven Parameter Sharing Is Crucial
23
Context-Dependent Acoustic Units
24
Machine Learning in Acoustic Modeling
  • Structural optimization often guided by an
    Occams Razor approach
  • Trading goodness of fit and model complexity
  • Examples MDL, BIC, AIC, Structural Risk
    Minimization, Automatic Relevance Determination

25
Summary
  • What we havent talked about duration models,
    adaptation, normalization, confidence measures,
    posterior-based scoring, hybrid systems,
    discriminative training, and much, much more
  • Applications of these models to language (Hazen),
    dialog (Phillips, Seneff), machine translation
    (Vogel, Papineni), and other HLT applications
  • Machine learning approaches to human language
    technology are still in their infancy (Bilmes)
  • A mathematical framework for integration of
    knowledge and metadata will be critical in the
    next 10 years.
  • Information extraction in a multilingual
    environment -- a time of great opportunity!

26
Appendix Relevant Publications
  • Useful textbooks
  • X. Huang, A. Acero, and H.W. Hon, Spoken Language
    Processing - A Guide to Theory, Algorithm, and
    System Development, Prentice Hall, ISBN
    0-13-022616-5, 2001.
  • D. Jurafsky and J.H. Martin, SPEECH and LANGUAGE
    PROCESSING An Introduction to Natural Language
    Processing, Computational Linguistics, and Speech
    Recognition, Prentice-Hall, ISBN 0-13-095069-6,
    2000.
  • F. Jelinek, Statistical Methods for Speech
    Recognition, MIT Press, ISBN 0-262-10066-5,
    1998.
  • L.R. Rabiner and B.W. Juang, Fundamentals of
    Speech Recognition, Prentice-Hall, ISBN
    0-13-015157-2, 1993.
  • J. Deller, et. al., Discrete-Time Processing of
    Speech Signals, MacMillan Publishing Co., ISBN
    0-7803-5386-2, 2000.
  • R.O. Duda, P.E. Hart, and D.G. Stork, Pattern
    Classification, Second Edition, Wiley
    Interscience, ISBN 0-471-05669-3, 2000
    (supporting material available at
    http//rii.ricoh.com/stork/DHS.html).
  • D. MacKay, Information Theory, Inference, and
    Learning Algorithms, Cambridge University Press,
    2003.
  • Relevant online resources
  • Intelligent Electronic Systems,
    http//www.cavs.msstate.edu/hse/ies, Center for
    Advanced Vehicular Systems, Mississippi State
    University, Mississippi State, Mississippi, USA,
    June 2005.
  • Internet-Accessible Speech Recognition
    Technology, http//www.cavs.msstate.edu/hse/ies/p
    rojects/speech, June 2005.
  • Speech and Signal Processing Demonstrations,
    http//www.cavs.msstate.edu/hse/ies/projects/speec
    h/software/demonstrations, June 2005.
  • Fundamentals of Speech Recognition,
    http//www.isip.msstate.edu/publications/courses/e
    ce_8463, September 2004.

27
  • Appendix Relevant Resources

28
Appendix Public Domain Speech Recognition
Technology
  • Speech recognition
  • State of the art
  • Statistical (e.g., HMM)
  • Continuous speech
  • Large vocabulary
  • Speaker independent
  • Goal Accelerate research
  • Flexibility, Extensibility, Modular
  • Efficient (C, Parallel Proc.)
  • Easy to Use (documentation)
  • Toolkits, GUIs
  • Benefit Technology
  • Standard benchmarks
  • Conversational speech

29
Appendix IES Is More Than Just Software
30
Appendix Nonlinear Statistical Modeling of Speech
31
Appendix An Algorithm Retrospective of HLT
  • Observations
  • Information theory preceded modern computing.
  • Early research focused on basic science.
  • Computing capacity has enabled engineering
    methods.
  • We are now knowledge-challenged.

32
A Historical Perspective of Prominent Disciplines
  • Observations
  • Field continually accumulating new expertise.
  • As obvious mathematical techniques have been
    exhausted (low-hanging fruit), there will be a
    return to basic science (e.g., fMRI brain
    activity imaging).

33
Evolution of Knowledge and Intelligence in HLT
Systems
  • A number of fundamental problem still remain
    (e.g., channel and noise robustness, less dense
    or less common languages).
  • The solution will require approaches that use
    expert knowledge from related, more dense domains
    (e.g., similar languages) and the ability to
    learn from small amounts of target data (e.g.,
    autonomic).

34
Appendix The Impact of Supercomputers on Research
  • Total available cycles for speech research from
    1983 to 1993 90 TeraMIPS
  • MS State Empire cluster (1,000 1 GHz
    processors)90 TeraMIPS per day
  • A Day in a Life 24 hours of idle time on a
    modern supercomputer is equivalent to 10 years of
    speech research at Texas Instruments!
  • Cost 1M is the nominal cost for scientific
    computing (from a 1 MIP VAX in 1983 to a
    1,000-node supercomputer)
Write a Comment
User Comments (0)
About PowerShow.com