Title: CS 551/651:
1CS 551/651 Structure of Spoken Language Lecture
9 The Source-Filter Model of Speech
Production John-Paul Hosom Fall 2010
2The Source-Filter Model
- One more model of speech proposed in 1848 by
Johannes - Müller, developed by Gunnar Fant circa 1970.
Also called - the Acoustic Theory of Speech Production.
- The Source-Filter Model provides a static
description of speech - speech dynamics are dealt with in models of
coarticulation. - According to this model, speech is defined by
three parts - A sound source vibration of the vocal folds,
air turbulence, or plosion - A tube through which the source passes the vocal
tract - Radiation of sound from the mouth
- These 3 components are assumed to be independent.
- We will discuss these three parts separately
3The Source-Filter Model Sound Source
- Voiced Sound Source
- produced by vibration of the vocal folds
- several models exist that describe the flow of
air throughthe vocal folds - each model describes the increase in air flow as
the glottisopens, decrease in air flow as it
closes, and no air flowas glottis remains closed
during pressure buildup. - in spectral domain, shape is approximately flat
at very lowfrequencies, and has 12 dB/octave
slope at higher freq. - Models Rosenberg, Fant (LF model), Fujisaki (FL
model), Klatt
glottis opening
glottal closure
glottis opening
air pressure (?Pa)
time (msec)
4The Source-Filter Model Sound Source
- Voiced Sound Source
- models are of glottal flow
- glottal flow is the same as volume velocity, V in
units of m3/s - volume velocity per unit area, or V/unit area, is
in units ofm/s, and is called the point
velocity, v. - acoustic pressure, p, in Pascals, equals
impedance Z times vp Z v - impedance is constant for a given glottis and
vocal tract - therefore, acoustic pressure is directly
proportional to glottal flow, and so the
vertical axis of these models canbe considered
either glottal flow, volume velocity, or
acoustic pressure (in micro Pascals).
5The Source-Filter Model Sound Source
All models have the following parameters pitch
period 1/F0 T0 open quotient (OQ) skew
(SK) These three parameters are used in a
function that describes how the sound pressure
changes over time within one pitch period.
glottis opening
glottal closure
glottis opening
T0
OQ
SK
OQ measured relative to T0 SK measured relative
to OQ
6The Source-Filter Model Sound Source
The Rosenberg model
gR(t) is glottal pulse with amplitude A and
duration T gR(t) has three phases the opening
phase until time TO, the closing phase until time
TC, and the closed phase with length T-(TOTC)
TC
T
TO
(from http//www.physik3.gwdg.de/micha/aachen98/a
achen98.html)
7The Source-Filter Model Sound Source
The Liljencrants-Fant (LF) Model
- uses sin() and exp() functions to create smooth
trajectory - many parameters allow detailed control of shape
- The Fujisaki-Ljungqvist (FL) Model
- similar to LF, but allows negative flow during
closed phase - simpler polynomial functions
(from http//www.ims.uni-stuttgart.de/phonetik/EGG
/page13.htm)
8The Source-Filter Model Sound Source
- Unvoiced Sound Source
- produced by pushing air through constriction in
mouth - a simple model noise that decreases at 6
dB/octave - Plosive Sound Source
- produced by pressure buildup, then release of
constriction - a very simple model approximately a step function
amplitude
time
9The Source-Filter Model Vocal Tract Filter
The vocal tract can be modeled as a series of
connected tubes with different lengths and
diameters
d4
A2
A3
A4
A5
A6
A1
l4
Life can be made much more simple if we start
with only two tubes for approximating different
vowels
A1
/iy/
A1
A2
A2
/uw/
/ah/
A1
/aa/
A2
A1
A2
10The Source-Filter Model Vocal Tract Filter
An electrical-engineering analogy can be drawn
between the tubes and a transmission
line. From this analogy, the formant
frequencies (frequencies of standingwaves) occur
when where
(from Flanagan, p. 70-71)
11The Source-Filter Model Vocal Tract Filter
In the simplest case of a single tube, the
formants are located at
and if l 17cm (the typical length of the male
vocal tract), then
etc. So, for a neutral vowel (no constriction in
the vocal tract), formants occur at 500, 1500,
2500, Hz.
12The Source-Filter Model Vocal Tract Filter
13The Source-Filter Model Vocal Tract Filter
14The Source-Filter Model Vocal Tract Filter
The two-tube model can be expanded to multiple
tubes the math becomes ugly, but results are
more realistic
15The Source-Filter Model Bandwidths
- In these cases, it has been assumed that the
tubes have - hard surfaces, which causes the resonant
frequencies (formants) - to have strong energy only at their center
frequencies - (energy is put into the system via the source,
but no energy is lost) - In reality, the resonant energies decay over
time energy - is absorbed by
- viscosity (caused by friction of air against
vocal-tract walls) - heat conduction (at the vocal-tract walls),
- soft surfaces of vocal-tract walls
- these effects cause bandwidth to increase with
frequency
16The Source-Filter Model Radiation
A final effect of the speech-production process
is radiation of sound from the lips As sound
radiates from a source, its energy
decreases. The decrease in energy is not the
same for all frequencies this effect can be
modeled as a 6 dB/octave increase
in energy which, coincidentally, is the
same equation as pre-emphasis with a1.0, and
also corresponds to a differentiation operation.
17The Source-Filter Model Radiation
The derivative effect of radiation from the lips
can be moved to the glottal-source model
glottal flow
T0
OQ
SK
glottal flow derivative
18The Source-Filter Model Radiation
The derivative effect of radiation from the lips
can also be moved to the models of frication and
plosion
- Unvoiced Sound Source
- a very simple model random (white) noise
- Plosive Sound Source
- a very simple model an impulse function
amplitude
time
19The Source-Filter Model Complete Picture
glottal source (harmonics)
radiation (log scale)
vocal tract filter (envelope)
final speech signal
20The Source-Filter Model Estimating Parameters
The vocal-tract parameters (formants) can be
estimated using LPC analysis, with the order of
LPC analysis equal to 2NF, where NF is the
expected number of formants. In practice, LPC
estimation of formants is not very
accurate because of slope of spectrum and
irregularities in the spectrum. Once the
formants are determined, they can then be
inverted, and the original signal filtered with
the inverted formants to obtain the source
radiation (first derivative of glottal flow)
signal. This is called inverse filtering.
21The Source-Filter Model Filtering
Formants can be modeled by a damped sinusoid,
which has the following representations where
S(f) is the spectrum at frequency value f, A is
overall amplitude, fc is the center frequency of
the damped sine wave, and ? is a damping factor.
Olive, p. 48, 58. Or, given formant and
sampling frequency, compute IIR filter
coefficients
(from Klatt, 1980)
22The Source-Filter Model
- A course project that studies the source-filter
model might - be interesting
- Implement LPC, extract formant values and
bandwidthsof different vowels how do envelope
and formant values change with different orders
of LPC (values of p)? - Do LPC analysis, then inverse filter the signal
to extract the glottal source waveform. Does it
look the way it should? - Construct two-tube models, predict formant
frequenciesof all vowels. - If youre more comfortable with programming,
signal processing, - etc.