CS 551/651: - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

CS 551/651:

Description:

Title: No Slide Title Author: hosom Last modified by: h Created Date: 12/24/1999 12:49:58 AM Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 23

Provided by: hosom

Category:

more less

Transcript and Presenter's Notes

Title: CS 551/651:

1
CS 551/651 Structure of Spoken Language Lecture
9 The Source-Filter Model of Speech
Production John-Paul Hosom Fall 2010
2
The Source-Filter Model

One more model of speech proposed in 1848 by
Johannes
Müller, developed by Gunnar Fant circa 1970.
Also called
the Acoustic Theory of Speech Production.
The Source-Filter Model provides a static
description of speech
speech dynamics are dealt with in models of
coarticulation.
According to this model, speech is defined by
three parts
A sound source vibration of the vocal folds,
air turbulence, or plosion
A tube through which the source passes the vocal
tract
Radiation of sound from the mouth
These 3 components are assumed to be independent.
We will discuss these three parts separately

3
The Source-Filter Model Sound Source

Voiced Sound Source
produced by vibration of the vocal folds
several models exist that describe the flow of
air throughthe vocal folds
each model describes the increase in air flow as
the glottisopens, decrease in air flow as it
closes, and no air flowas glottis remains closed
during pressure buildup.
in spectral domain, shape is approximately flat
at very lowfrequencies, and has 12 dB/octave
slope at higher freq.
Models Rosenberg, Fant (LF model), Fujisaki (FL
model), Klatt

glottis opening
glottal closure
glottis opening
air pressure (?Pa)
time (msec)
4
The Source-Filter Model Sound Source

Voiced Sound Source
models are of glottal flow
glottal flow is the same as volume velocity, V in
units of m3/s
volume velocity per unit area, or V/unit area, is
in units ofm/s, and is called the point
velocity, v.
acoustic pressure, p, in Pascals, equals
impedance Z times vp Z v
impedance is constant for a given glottis and
vocal tract
therefore, acoustic pressure is directly
proportional to glottal flow, and so the
vertical axis of these models canbe considered
either glottal flow, volume velocity, or
acoustic pressure (in micro Pascals).

5
The Source-Filter Model Sound Source
All models have the following parameters pitch
period 1/F0 T0 open quotient (OQ) skew
(SK) These three parameters are used in a
function that describes how the sound pressure
changes over time within one pitch period.
glottis opening
glottal closure
glottis opening
T0
OQ
SK
OQ measured relative to T0 SK measured relative
to OQ
6
The Source-Filter Model Sound Source
The Rosenberg model
gR(t) is glottal pulse with amplitude A and
duration T gR(t) has three phases the opening
phase until time TO, the closing phase until time
TC, and the closed phase with length T-(TOTC)
TC
T
TO
(from http//www.physik3.gwdg.de/micha/aachen98/a
achen98.html)
7
The Source-Filter Model Sound Source
The Liljencrants-Fant (LF) Model

uses sin() and exp() functions to create smooth
trajectory
many parameters allow detailed control of shape
The Fujisaki-Ljungqvist (FL) Model
similar to LF, but allows negative flow during
closed phase
simpler polynomial functions

(from http//www.ims.uni-stuttgart.de/phonetik/EGG
/page13.htm)
8
The Source-Filter Model Sound Source

Unvoiced Sound Source
produced by pushing air through constriction in
mouth
a simple model noise that decreases at 6
dB/octave
Plosive Sound Source
produced by pressure buildup, then release of
constriction
a very simple model approximately a step function

amplitude
time
9
The Source-Filter Model Vocal Tract Filter
The vocal tract can be modeled as a series of
connected tubes with different lengths and
diameters
d4
A2
A3
A4
A5
A6
A1
l4
Life can be made much more simple if we start
with only two tubes for approximating different
vowels
A1
/iy/
A1
A2
A2
/uw/
/ah/
A1
/aa/
A2
A1
A2
10
The Source-Filter Model Vocal Tract Filter
An electrical-engineering analogy can be drawn
between the tubes and a transmission
line. From this analogy, the formant
frequencies (frequencies of standingwaves) occur
when where
(from Flanagan, p. 70-71)
11
The Source-Filter Model Vocal Tract Filter
In the simplest case of a single tube, the
formants are located at
and if l 17cm (the typical length of the male
vocal tract), then
etc. So, for a neutral vowel (no constriction in
the vocal tract), formants occur at 500, 1500,
2500, Hz.
12
The Source-Filter Model Vocal Tract Filter
13
The Source-Filter Model Vocal Tract Filter
14
The Source-Filter Model Vocal Tract Filter
The two-tube model can be expanded to multiple
tubes the math becomes ugly, but results are
more realistic
15
The Source-Filter Model Bandwidths

In these cases, it has been assumed that the
tubes have
hard surfaces, which causes the resonant
frequencies (formants)
to have strong energy only at their center
frequencies
(energy is put into the system via the source,
but no energy is lost)
In reality, the resonant energies decay over
time energy
is absorbed by
viscosity (caused by friction of air against
vocal-tract walls)
heat conduction (at the vocal-tract walls),
soft surfaces of vocal-tract walls
these effects cause bandwidth to increase with
frequency

16
The Source-Filter Model Radiation
A final effect of the speech-production process
is radiation of sound from the lips As sound
radiates from a source, its energy
decreases. The decrease in energy is not the
same for all frequencies this effect can be
modeled as a 6 dB/octave increase
in energy which, coincidentally, is the
same equation as pre-emphasis with a1.0, and
also corresponds to a differentiation operation.
17
The Source-Filter Model Radiation
The derivative effect of radiation from the lips
can be moved to the glottal-source model
glottal flow
T0
OQ
SK
glottal flow derivative
18
The Source-Filter Model Radiation
The derivative effect of radiation from the lips
can also be moved to the models of frication and
plosion

Unvoiced Sound Source
a very simple model random (white) noise
Plosive Sound Source
a very simple model an impulse function

amplitude
time
19
The Source-Filter Model Complete Picture
glottal source (harmonics)
radiation (log scale)
vocal tract filter (envelope)
final speech signal
20
The Source-Filter Model Estimating Parameters
The vocal-tract parameters (formants) can be
estimated using LPC analysis, with the order of
LPC analysis equal to 2NF, where NF is the
expected number of formants. In practice, LPC
estimation of formants is not very
accurate because of slope of spectrum and
irregularities in the spectrum. Once the
formants are determined, they can then be
inverted, and the original signal filtered with
the inverted formants to obtain the source
radiation (first derivative of glottal flow)
signal. This is called inverse filtering.
21
The Source-Filter Model Filtering
Formants can be modeled by a damped sinusoid,
which has the following representations where
S(f) is the spectrum at frequency value f, A is
overall amplitude, fc is the center frequency of
the damped sine wave, and ? is a damping factor.
Olive, p. 48, 58. Or, given formant and
sampling frequency, compute IIR filter
coefficients
(from Klatt, 1980)
22
The Source-Filter Model

A course project that studies the source-filter
model might
be interesting
Implement LPC, extract formant values and
bandwidthsof different vowels how do envelope
and formant values change with different orders
of LPC (values of p)?
Do LPC analysis, then inverse filter the signal
to extract the glottal source waveform. Does it
look the way it should?
Construct two-tube models, predict formant
frequenciesof all vowels.
If youre more comfortable with programming,
signal processing,
etc.