CS 551/651: - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

CS 551/651:

Description:

Title: No Slide Title Author: hosom Last modified by: h Created Date: 12/24/1999 12:49:58 AM Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 23
Provided by: hosom
Category:

less

Transcript and Presenter's Notes

Title: CS 551/651:


1
CS 551/651 Structure of Spoken Language Lecture
9 The Source-Filter Model of Speech
Production John-Paul Hosom Fall 2010
2
The Source-Filter Model
  • One more model of speech proposed in 1848 by
    Johannes
  • Müller, developed by Gunnar Fant circa 1970.
    Also called
  • the Acoustic Theory of Speech Production.
  • The Source-Filter Model provides a static
    description of speech
  • speech dynamics are dealt with in models of
    coarticulation.
  • According to this model, speech is defined by
    three parts
  • A sound source vibration of the vocal folds,
    air turbulence, or plosion
  • A tube through which the source passes the vocal
    tract
  • Radiation of sound from the mouth
  • These 3 components are assumed to be independent.
  • We will discuss these three parts separately

3
The Source-Filter Model Sound Source
  • Voiced Sound Source
  • produced by vibration of the vocal folds
  • several models exist that describe the flow of
    air throughthe vocal folds
  • each model describes the increase in air flow as
    the glottisopens, decrease in air flow as it
    closes, and no air flowas glottis remains closed
    during pressure buildup.
  • in spectral domain, shape is approximately flat
    at very lowfrequencies, and has 12 dB/octave
    slope at higher freq.
  • Models Rosenberg, Fant (LF model), Fujisaki (FL
    model), Klatt

glottis opening
glottal closure
glottis opening
air pressure (?Pa)
time (msec)
4
The Source-Filter Model Sound Source
  • Voiced Sound Source
  • models are of glottal flow
  • glottal flow is the same as volume velocity, V in
    units of m3/s
  • volume velocity per unit area, or V/unit area, is
    in units ofm/s, and is called the point
    velocity, v.
  • acoustic pressure, p, in Pascals, equals
    impedance Z times vp Z v
  • impedance is constant for a given glottis and
    vocal tract
  • therefore, acoustic pressure is directly
    proportional to glottal flow, and so the
    vertical axis of these models canbe considered
    either glottal flow, volume velocity, or
    acoustic pressure (in micro Pascals).

5
The Source-Filter Model Sound Source
All models have the following parameters pitch
period 1/F0 T0 open quotient (OQ) skew
(SK) These three parameters are used in a
function that describes how the sound pressure
changes over time within one pitch period.
glottis opening
glottal closure
glottis opening
T0
OQ
SK
OQ measured relative to T0 SK measured relative
to OQ
6
The Source-Filter Model Sound Source
The Rosenberg model
gR(t) is glottal pulse with amplitude A and
duration T gR(t) has three phases the opening
phase until time TO, the closing phase until time
TC, and the closed phase with length T-(TOTC)
TC
T
TO
(from http//www.physik3.gwdg.de/micha/aachen98/a
achen98.html)
7
The Source-Filter Model Sound Source
The Liljencrants-Fant (LF) Model
  • uses sin() and exp() functions to create smooth
    trajectory
  • many parameters allow detailed control of shape
  • The Fujisaki-Ljungqvist (FL) Model
  • similar to LF, but allows negative flow during
    closed phase
  • simpler polynomial functions

(from http//www.ims.uni-stuttgart.de/phonetik/EGG
/page13.htm)
8
The Source-Filter Model Sound Source
  • Unvoiced Sound Source
  • produced by pushing air through constriction in
    mouth
  • a simple model noise that decreases at 6
    dB/octave
  • Plosive Sound Source
  • produced by pressure buildup, then release of
    constriction
  • a very simple model approximately a step function

amplitude
time
9
The Source-Filter Model Vocal Tract Filter
The vocal tract can be modeled as a series of
connected tubes with different lengths and
diameters
d4
A2
A3
A4
A5
A6
A1
l4
Life can be made much more simple if we start
with only two tubes for approximating different
vowels
A1
/iy/
A1
A2
A2
/uw/
/ah/
A1
/aa/
A2
A1
A2
10
The Source-Filter Model Vocal Tract Filter
An electrical-engineering analogy can be drawn
between the tubes and a transmission
line. From this analogy, the formant
frequencies (frequencies of standingwaves) occur
when where
(from Flanagan, p. 70-71)
11
The Source-Filter Model Vocal Tract Filter
In the simplest case of a single tube, the
formants are located at
and if l 17cm (the typical length of the male
vocal tract), then
etc. So, for a neutral vowel (no constriction in
the vocal tract), formants occur at 500, 1500,
2500, Hz.
12
The Source-Filter Model Vocal Tract Filter
13
The Source-Filter Model Vocal Tract Filter
14
The Source-Filter Model Vocal Tract Filter
The two-tube model can be expanded to multiple
tubes the math becomes ugly, but results are
more realistic
15
The Source-Filter Model Bandwidths
  • In these cases, it has been assumed that the
    tubes have
  • hard surfaces, which causes the resonant
    frequencies (formants)
  • to have strong energy only at their center
    frequencies
  • (energy is put into the system via the source,
    but no energy is lost)
  • In reality, the resonant energies decay over
    time energy
  • is absorbed by
  • viscosity (caused by friction of air against
    vocal-tract walls)
  • heat conduction (at the vocal-tract walls),
  • soft surfaces of vocal-tract walls
  • these effects cause bandwidth to increase with
    frequency

16
The Source-Filter Model Radiation
A final effect of the speech-production process
is radiation of sound from the lips As sound
radiates from a source, its energy
decreases. The decrease in energy is not the
same for all frequencies this effect can be
modeled as a 6 dB/octave increase
in energy which, coincidentally, is the
same equation as pre-emphasis with a1.0, and
also corresponds to a differentiation operation.
17
The Source-Filter Model Radiation
The derivative effect of radiation from the lips
can be moved to the glottal-source model
glottal flow
T0
OQ
SK
glottal flow derivative
18
The Source-Filter Model Radiation
The derivative effect of radiation from the lips
can also be moved to the models of frication and
plosion
  • Unvoiced Sound Source
  • a very simple model random (white) noise
  • Plosive Sound Source
  • a very simple model an impulse function

amplitude
time
19
The Source-Filter Model Complete Picture
glottal source (harmonics)
radiation (log scale)
vocal tract filter (envelope)
final speech signal
20
The Source-Filter Model Estimating Parameters
The vocal-tract parameters (formants) can be
estimated using LPC analysis, with the order of
LPC analysis equal to 2NF, where NF is the
expected number of formants. In practice, LPC
estimation of formants is not very
accurate because of slope of spectrum and
irregularities in the spectrum. Once the
formants are determined, they can then be
inverted, and the original signal filtered with
the inverted formants to obtain the source
radiation (first derivative of glottal flow)
signal. This is called inverse filtering.
21
The Source-Filter Model Filtering
Formants can be modeled by a damped sinusoid,
which has the following representations where
S(f) is the spectrum at frequency value f, A is
overall amplitude, fc is the center frequency of
the damped sine wave, and ? is a damping factor.
Olive, p. 48, 58. Or, given formant and
sampling frequency, compute IIR filter
coefficients
(from Klatt, 1980)
22
The Source-Filter Model
  • A course project that studies the source-filter
    model might
  • be interesting
  • Implement LPC, extract formant values and
    bandwidthsof different vowels how do envelope
    and formant values change with different orders
    of LPC (values of p)?
  • Do LPC analysis, then inverse filter the signal
    to extract the glottal source waveform. Does it
    look the way it should?
  • Construct two-tube models, predict formant
    frequenciesof all vowels.
  • If youre more comfortable with programming,
    signal processing,
  • etc.
Write a Comment
User Comments (0)
About PowerShow.com