Speech Analysis

About This Presentation

Title:

Speech Analysis

Description:

Speech spectrogram. Derived formant tracks. A/D conversion ... Wide-band and narrow-band spectrograms. Mel-frequency filter bank ... – PowerPoint PPT presentation

Number of Views:1333

Avg rating:3.0/5.0

Slides: 31

Provided by: philipj5

Category:

more less

Transcript and Presenter's Notes

Title: Speech Analysis

1
Speech Analysis
EEM.ssr Speaker Speech Recognition

by
Dr Philip Jackson

lecturer in speech audio Centre for Vision,
Speech Signal Processing, Department of
Electronic Engineering.
http//www.ee.surrey.ac.uk/Teaching/Courses/eem.ss
r
2
Whats the point of analysing speech?

Speech analysis, or speech processing, transforms
a speech waveform into a representation that is
suitable for extracting its features
Human visual inspection
e.g., by a speech scientist, speech therapist, or
forensic phonetician
Computer analysis
e.g., for automatic speech recognition, speaker
recognition, or paralinguistic processing

3
And what does that mean?

Suitable could be
amenable to human visual inspection
using a small number of bits per second (for
transmission or storage)
compatible with the models in a speech recognizer
in line with our understanding of human auditory
processing

4
Cochlear section

Cochlea, or inner ear, has a spiral form
vestibular canal
basilar membrane
tympanic canal
auditory nerve

5
Response of the cochlea
6
Basilar membrane

sound enters at the stapes
travels along the basilar membrane
vibrates at matching position
activates auditory nerves

7
Short-term spectrum

Represents the distribution of power with respect
to frequency over a time interval centred at
time, t, like a vertical slice through the
spectrogram
From a source-filter perspective, it gives us
some information about the shape of the vocal
tract at time t
From a human speech perception view, it provides
similar information to that sent from the cochlea
to the auditory nerve

8
Computing the ST-spectrum

Analogue-to-Digital (A/D) Conversion
convert the analogue signal from the microphone
into a digital signal
Windowing
select a short section of speech, centred at time
t, and smooth
Frequency analysis
estimate the distribution of power with respect
to frequency

9
Waterfall display
10
Speech spectrogram
11
Derived formant tracks
12
A/D conversion

Sampling measures the speech signal at regular
intervals, n
Quantisation encodes the signal xn with a
discrete value

x
n
n
13
Sample rate

Nyquists theorem for a signal band-limited to B
Hz, then a rate of 2B samples per second is
needed to encode the signal faithfully
Human ear sensitive up 20 kHz (hence 44 kHz rate
for CDs)
But for speech
high-quality needs 10 kHz bandwidth, i.e., 20 kHz
sample rate
bandwidth can be reduced to 4 kHz (8 kHz rate),
for telephone quality
e.g., 8-bit PCM at 8kHz 64 kbps

14
CD-quality fS 44 kHz
15
High-quality speech fS 20 kHz
16
Telephone speech fS 8 kHz
17
Window functions
18
Frequency analysis

Discrete Fourier Transform (DFT) is applied to
the windowed digital waveform x(n)n1,,N.
With an N-sample window, an N-point complex
spectrum is obtained X(k) k1,,N.
The modulus squared gives the power spectrum,
X2(k)
The logarithm gives the log-power spectrum,
logX2(k)

19
Discrete Fourier transform

over a finite period of time
sampled at regular intervals

Forward transform
Inverse transform
20
Frequency analysis

Alternative methods include
filter-bank analysis (based on a set of band-pass
filters)
approximations of the spectral envelope, e.g.,
Linear predictive coding (LPC)

21
Time-frequency resolution 1

If the window is long then
the time resolution is poor
the number of points, N, is large
there are N points in the spectrum
so there is fine frequency resolution
narrow-band frequency analysis, or narrow-band
spectrum

22
Narrow-band spectrum
23
Time-frequency resolution 2

If the window is short then
the time resolution is good
the number of points, N, is small
there are N points in the spectrum
so the frequency resolution is coarse
broad-band frequency analysis, or broad-band
spectrum

24
Wide-band spectrum
25
Time-frequency resolution 3

In summary
long window, narrow-band spectrum
short window, broad-band spectrum.
Indeed, the bandwidth-time product cannot exceed
a half
where and is the sample
rate

26
Wide-band and narrow-band spectrograms
27
Mel-frequency filter bank

Allocation of DFT bins to filters, spaced
according to the Mel scale

28
The real cepstrum

Procedure for computing cepstral coefficients
from the magnitude spectrum

29
Mel-frequency cepstrum

Procedure for computing cepstral coefficients,
based on the output from Mel-frequency binning

30
Summary of Fourier analysis

Fourier leads to frequency representation
good for visualisation
is reversible
continuous and discrete time forms
Wide- and narrow-band spectra obtained by
adjusting frame size
Windowing
reduces spectral smearing
allows for adaptation

Write a Comment

User Comments (0)

About PowerShow.com

Speech Analysis - PowerPoint PPT Presentation

Speech Analysis

Speech spectrogram. Derived formant tracks. A/D conversion ... Wide-band and narrow-band spectrograms. Mel-frequency filter bank ... – PowerPoint PPT presentation