RESEARCH PRESENTATION: - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

RESEARCH PRESENTATION:

Description:

... E. Hart, and D.G. Stork, Pattern Classification, Second Edition, Wiley ... 3, 2000 (supporting material available at http://rii.ricoh.com/~stork/DHS.html) ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 35

Provided by: jennifer194

Category:

more less

Transcript and Presenter's Notes

Title: RESEARCH PRESENTATION:

1
RESEARCH PRESENTATION
An overview of work done so far in 2006
Madhulika Pannuri Intelligent Electronic
Systems Human and Systems Engineering Center for
Advanced Vehicular Systems
2

Abstract

ABSTRACT
Language Model ABNF
The LanguageModel ABNF reads in a set of
productions in the form of ABNF and converts them
to BNF. These productions are passes to
LanguageModel BNF in the form they are received.
Optimum time Delay estimation
Computes optimum time delay for
re-constructing phase space. Auto Mutual
Information is used for finding optimum time
delay.
Dimension Estimation
The dimension of an attractor is a measure of
its geometric scaling properties and is the most
basic property of an attractor. Though there are
many dimensions, we will be implementing
Correlation Dimension and Lyapunov Dimension.

3
Language Model ABNF

Why conversion from ABNF to BNF?
ABNF balances compactness and simplicity, with
reasonable representational power.
Differences The differences between BNF and ABNF
involve naming rules, repetition, alternatives,
order-independence and value ranges.
Removing the meta symbols makes it easy to
convert it to finite state machines (IHD).
Problem Converting any given ABNF to BNF is
impossible. So, generalize the algorithm so that
most of the expressions can be converted.
The steps for conversion involve removing meta
symbols one after the other. There were
complications like multiple nesting.
Ex ((a, n) (b s))

4
Language Model ABNF

Concatenation
Removing concatenation requires two new
rules with new names to be introduced. The last
two productions resulting from conversion are not
valid CFGs but shorter than the original.
Alternation
The production is replaced with a pair of
productions. They may or may not be legal CFGs.
Kleene Star
New variable is introduced and the production
is replaced. Gives rise to epsilon transition.
Kleene Plus
Similar to Kleene Plus. No null transition.

5
Optimum time delay estimation

Why calculating time delay ?
If we have a trajectory from a chaotic system and
we only have data from one of the system
variables, there's a neat theorem that says you
can reconstruct a copy of the attractor of the
system by lagging the time series to embed it in
more dimensions.
In other words, if we have a point F( x, y, z, t)
which is along some strange attractor, and we can
only measure F(z,t), we can plot F (z, z N, z
2N, t), and the resulting object will be
topologically identical to the original attractor
..!!!
Method of time delays provides a relatively
simple way of constructing an attractor from
single experimental time series.
So, how do we choose time delay ?
Choosing optimum time delay is not trivial as the
dynamical properties of the reconstructed
attractor are to be amenable for subsequent
analysis.
For an infinite amount of noise free data, we are
free to choose arbitrarily, the time delay.

6
Optimum time delay estimation

For smaller values of tau, s(t) and s(ttau) are
very close to each other in numerical value, and
hence they are not independent of each other.
For large values of tau, s(t) and s(t tau) are
completely independent of each other, and any
connection between them in the case of chaotic
attractor is random because of butterfly effect.
We need a criterion for intermediate choice that
is large enough so that s(t) and s(ttau) are
independent but not so large that s(t) and
s(ttau) are completely independent in
statistical sense.
Time delay is a multiple of sampling time (data
is available at these times only).
There are four common methods for determining an
optimum time delay
Visual Inspection of reconstructed attractors
The simplest way to choose tau.
Consider successively larger values of tau and
then visually inspect the phase portrait of the
resulting attractor.
Choosing the tau value which appears to give the
most spread out attractor.
Disadvantage
This produces reasonable results with relatively
simple systems.

7
Methods to estimate optimum time delay

2. Dominant period relationship
We use the property that time delay is
one quarter of the dominant period.
Advantage
Quick and easy method for determining tau.
Disadvantages
Can be used for low dimensional systems .
Many complex systems do not possess a single
dominant frequency.
The Autocorrelation function
The auto correlation function C, compares two
data points in the time series separated by delay
tau, and is defined as,
The delay for the attractor reconstruction, tau,
is then taken at a specific threshold value of C.
The behavior of C is inconsistent .

8
Feature Extraction in Speech Recognition

4. Minimum auto mutual Information method
The Mutual Information is given by
When the mutual information is minimum, the
attractor is as spread out as much as possible.
This condition for the choice of delay time is
known as minimum mutual information criterion.
Practical Implementation
To calculate the mutual information, the 2D
reconstruction of an attractor is partitioned
into a grid of Nc columns and Nr rows.
Discrete probability density functions for X(i)
and X(itau) are generated by summing the data
points in each row and column of the grid
respectively and dividing by the total number of
attractor points.
The joint probability of occurance P(k, l) of
the attractor in any particular box is calculated
by counting the number of discrete points in the
box and dividing by the total number of points on
the attractor trajectory.
The value of tau which gives the first minimum is
the attractor reconstruction delay.

9
Method used to calculate Mutual Information
10
Ami plots for sine using IFC
11
Lorenz time series
12
Ami for lorentz time series (IFC)
13
ami plots with white noise added
14
Attractor variation with tau value
15
Doubly Stochastic Systems

The 1-coin model is observable because the output
sequence can be mapped to a specific sequence of
state transitions
The remaining models are hidden because the
underlying state sequence cannot be directly
inferred from the output sequence

16
Discrete Markov Models
17
Markov Models Are Computationally Simple
18
Training Recipes Are Complex And Iterative
19
Bootstrapping Is Key In Parameter Reestimation
20
The Expectation-Maximization Algorithm (EM)
21
Controlling Model Complexity
22
Data-Driven Parameter Sharing Is Crucial
23
Context-Dependent Acoustic Units
24
Machine Learning in Acoustic Modeling

Structural optimization often guided by an
Occams Razor approach
Trading goodness of fit and model complexity
Examples MDL, BIC, AIC, Structural Risk
Minimization, Automatic Relevance Determination

25
Summary

What we havent talked about duration models,
adaptation, normalization, confidence measures,
posterior-based scoring, hybrid systems,
discriminative training, and much, much more
Applications of these models to language (Hazen),
dialog (Phillips, Seneff), machine translation
(Vogel, Papineni), and other HLT applications
Machine learning approaches to human language
technology are still in their infancy (Bilmes)
A mathematical framework for integration of
knowledge and metadata will be critical in the
next 10 years.
Information extraction in a multilingual
environment -- a time of great opportunity!

26
Appendix Relevant Publications

Useful textbooks
X. Huang, A. Acero, and H.W. Hon, Spoken Language
Processing - A Guide to Theory, Algorithm, and
System Development, Prentice Hall, ISBN
0-13-022616-5, 2001.
D. Jurafsky and J.H. Martin, SPEECH and LANGUAGE
PROCESSING An Introduction to Natural Language
Processing, Computational Linguistics, and Speech
Recognition, Prentice-Hall, ISBN 0-13-095069-6,
2000.
F. Jelinek, Statistical Methods for Speech
Recognition, MIT Press, ISBN 0-262-10066-5,
1998.
L.R. Rabiner and B.W. Juang, Fundamentals of
Speech Recognition, Prentice-Hall, ISBN
0-13-015157-2, 1993.
J. Deller, et. al., Discrete-Time Processing of
Speech Signals, MacMillan Publishing Co., ISBN
0-7803-5386-2, 2000.
R.O. Duda, P.E. Hart, and D.G. Stork, Pattern
Classification, Second Edition, Wiley
Interscience, ISBN 0-471-05669-3, 2000
(supporting material available at
http//rii.ricoh.com/stork/DHS.html).
D. MacKay, Information Theory, Inference, and
Learning Algorithms, Cambridge University Press,
2003.

Relevant online resources
Intelligent Electronic Systems,
http//www.cavs.msstate.edu/hse/ies, Center for
Advanced Vehicular Systems, Mississippi State
University, Mississippi State, Mississippi, USA,
June 2005.
Internet-Accessible Speech Recognition
Technology, http//www.cavs.msstate.edu/hse/ies/p
rojects/speech, June 2005.
Speech and Signal Processing Demonstrations,
http//www.cavs.msstate.edu/hse/ies/projects/speec
h/software/demonstrations, June 2005.
Fundamentals of Speech Recognition,
http//www.isip.msstate.edu/publications/courses/e
ce_8463, September 2004.

Appendix Relevant Resources

28
Appendix Public Domain Speech Recognition
Technology

Speech recognition
State of the art
Statistical (e.g., HMM)
Continuous speech
Large vocabulary
Speaker independent
Goal Accelerate research
Flexibility, Extensibility, Modular
Efficient (C, Parallel Proc.)
Easy to Use (documentation)
Toolkits, GUIs
Benefit Technology
Standard benchmarks
Conversational speech

29
Appendix IES Is More Than Just Software
30
Appendix Nonlinear Statistical Modeling of Speech
31
Appendix An Algorithm Retrospective of HLT

Observations
Information theory preceded modern computing.
Early research focused on basic science.
Computing capacity has enabled engineering
methods.
We are now knowledge-challenged.

32
A Historical Perspective of Prominent Disciplines

Observations
Field continually accumulating new expertise.
As obvious mathematical techniques have been
exhausted (low-hanging fruit), there will be a
return to basic science (e.g., fMRI brain
activity imaging).

33
Evolution of Knowledge and Intelligence in HLT
Systems

A number of fundamental problem still remain
(e.g., channel and noise robustness, less dense
or less common languages).

The solution will require approaches that use
expert knowledge from related, more dense domains
(e.g., similar languages) and the ability to
learn from small amounts of target data (e.g.,
autonomic).

34
Appendix The Impact of Supercomputers on Research