Natural Language Understanding - PowerPoint PPT Presentation

About This Presentation

Natural Language Understanding


The Spectrogram I. A visual representation of speech which contains all the ... The Spectrogram II ... The need to recognize invariant features in a spectrogram ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 19
Provided by: pcxp


Transcript and Presenter's Notes

Title: Natural Language Understanding

Natural Language Understanding
  • Raivydas Simenas

  • History
  • Speech Recognition
  • Natural Language Understanding
  • statistical methods to resolve ambiguities
  • Current situation

  • Roots in teaching the deaf to speak using
    visible speech
  • 1874 Alexander Bells invention of harmonic
  • Different frequency harmonics from an electrical
    signal could be separated
  • Could sent multiple messages over the same wire
    at the same time
  • 1940s separating the speech signal into
    different frequency components using the
  • 1950s the beginning of computer use for
    automatic speech recognition

The Nature of Speech
  • Phoneme a basic sound, e.g. a vowel
  • The complexity of human vocal apparatus about 18
    phonemes per second
  • Speech viewed as a sound wave
  • Identifying sounds analyzing the sound wave into
    its frequency components

The Spectrogram I
  • A visual representation of speech which contains
    all the salient information
  • Plots the amount of energy at different
    frequencies against time
  • Discontinuous speech (making a pause after each
    word) easier to recognize on the spectrogram

The Spectrogram II
  • The same word uttered twice (especially by
    different speakers speaker independence) might
    look radically different on a spectrogram
  • The need to recognize invariant features in a
  • Formants resonant frequencies sustained for a
    short time period in pronouncing a vowel
  • Normalization distinguishing between relevant
    and irrelevant information
  • Nonlinear time compression taking care of the
    changing speed of a speech
  • Matching a spoken word to a template

Robust Speech Recognition
  • Need to maintain accuracy when the quality of the
    input speech is degraded or when the speech
    characteristics differ due to change in
    environment or speakers
  • Dynamic parameter adaptation either alter the
    input signal or the internally stored
  • Optimal parameter estimation based on a
    statistical model characterizing the differences
    between training and test sets
  • Empirical feature comparison based on comparison
    between high-quality speech and the same speech
    recorded under degraded conditions

Stochastic Methods in Speech Recognition
  • Generating the sequence of word hypotheses for an
    acoustic signal is most often done using
  • The process
  • A sequence of acoustic signals is represented
    using a collection of vectors
  • Such collections are used to build acoustic word
    models, which consist of probabilities of certain
    sequences of vectors representing a word
  • Acoustic word models utilize Markov chains

Representing Sentences
  • Syntactic form indicates the way the words are
    related to each other in a sentence
  • Logical form identifying the semantic
    relationships between words based solely on the
    knowledge of the language (independently of the
  • Final meaning representation mapping the
    information from the syntactic and logical form
    into knowledge representation
  • System uses knowledge representation to represent
    and reason about its application domain

Parsing a Sentence
  • Parsing determining the structure of the
    sentence according to the grammar
  • Tree representation of a sentence
  • Transition network grammars
  • Start with initial node
  • Can traverse an arc only if it is labeled with an
    appropriate category

Stochastic Methods for Ambiguity Resolution I
  • Some sentences can be parsed many different
    ways, e.g. time flies like an arrow
  • The most popular method for this is based on
  • Some facts from probability theory
  • The concept of the random variable, e.g. the
    lexical category of flies
  • Probability function
  • assigns probability to every possible value of
    the random variable, e.g. 0.3 for flies being a
    noun, 0.7 for its being a verb
  • conditional probability functions (Pr(AB)), e.g.
    the probability for the occurrence of a verb
    given the fact that a noun already occurred

Stochastic Methods for Ambiguity Resolution II
  • Probabilities are used to predict future events
    given some data about the past
  • Maximum likelihood estimator (MLE)
  • Probability of X happening in the future number
    of cases of X happening in the past/total number
    of events in the past
  • Works well only if X occurred often, not very
    useful for low-frequency events
  • Expected likelihood estimator (ELE)
  • Probability of X happening in the future
    f(number of cases of X happening in the
    past)/Sum(f(number of cases of some event
    happening in the past)), e.g. if
    f(Pr(X))Pr(X)0.5 and we know that Pr(X)0.4 and
    Pr(Y)0.6, then ELE(X)(0.40.5)/(
  • MLE is a special case of ELE, i.e. for MLE
  • Given a large amount of text, one can use MLE or
    ELE to determine the lexical category of an
    ambiguous word, e.g. the word flies

Stochastic Methods for Ambiguity Resolution III
  • Always choosing the interpretation that occurs
    most frequently in the training set on average
    obtains 90 success rate (not good)
  • Some of the local context should be used to
    determine the lexical category of a word
  • Ideally, for a sequence of words w1,w2,,wn we
    want a lexical category sequence c1,c2,,cn which
    maximizes the probability of right interpretation
  • In practice, approximations of such probabilities
    are made

Stochastic Methods for Ambiguity Resolution IV
  • n-gram models
  • Look at the probability of a lexical category Ci
    which follows the sequence of lexical categories
  • Probability of c1,c2,,ck occurring is
    approximately the product of n-gram probabilities
    for each word, e.g. the probability of a sequence
    ART, N, V is 0.7110.43.3053
  • In practice, bigram or trigram models are used
    most often
  • The models capturing the concept are called
    Hidden Markov Models

Stochastic Methods for Ambiguity Resolution V
  • In order to determine the most likely
    interpretation of a given sequence of n words, we
    want to maximize the value of
  • The Viterbi algorithm
  • Given k lexical categories, the total number of
    possibilities to consider for a sequence of n
    words is kn
  • The Viterbi algorithm reduces this number to

Logical Form
  • Although interpreting sentence often requires the
    knowledge of the context, some interpretation can
    be done independently of it
  • basic semantic properties of a word, its
    different senses etc.
  • Ontology
  • each word has 1 or more senses in which it can be
    used, e.g. go has about 40 senses
  • the different senses of all the words of a
    natural language are organized into classes of
    objects, such as events, actions etc.
  • the set of such classes is called an ontology
  • Logical form of an utterance can be viewed as a
    function that maps current discourse situation
    into a new one resulting from the occurrence of
    the utterance

Current Situation
  • Inexpensive software for speech recognition
  • The issues large vocabulary, continuous speech
    and speaker independence
  • Automated speech recognition for restricted
  • The speed of serial processes in a computer vs.
    the number of parallel processes in human brain

  • Survey of the State of the Art in Human Language
    Technology, edited by Ronald A. Cole, 1996
  • James Allen. Natural Language Understanding, 1995
  • Raymond Kurzweil. When will HAL understand what
    we are saying? Computer Speech Recognition and
    Understanding. Taken from HALs Legacy, 1996
Write a Comment
User Comments (0)