Title: Speech Analysis
1Speech Analysis
EEM.ssr Speaker Speech Recognition
lecturer in speech audio Centre for Vision,
Speech Signal Processing, Department of
Electronic Engineering.
http//www.ee.surrey.ac.uk/Teaching/Courses/eem.ss
r
2Whats the point of analysing speech?
- Speech analysis, or speech processing, transforms
a speech waveform into a representation that is
suitable for extracting its features - Human visual inspection
- e.g., by a speech scientist, speech therapist, or
forensic phonetician - Computer analysis
- e.g., for automatic speech recognition, speaker
recognition, or paralinguistic processing
3And what does that mean?
- Suitable could be
- amenable to human visual inspection
- using a small number of bits per second (for
transmission or storage) - compatible with the models in a speech recognizer
- in line with our understanding of human auditory
processing
4Cochlear section
- Cochlea, or inner ear, has a spiral form
- vestibular canal
- basilar membrane
- tympanic canal
- auditory nerve
5Response of the cochlea
6Basilar membrane
- sound enters at the stapes
- travels along the basilar membrane
- vibrates at matching position
- activates auditory nerves
7Short-term spectrum
- Represents the distribution of power with respect
to frequency over a time interval centred at
time, t, like a vertical slice through the
spectrogram - From a source-filter perspective, it gives us
some information about the shape of the vocal
tract at time t - From a human speech perception view, it provides
similar information to that sent from the cochlea
to the auditory nerve
8Computing the ST-spectrum
- Analogue-to-Digital (A/D) Conversion
- convert the analogue signal from the microphone
into a digital signal - Windowing
- select a short section of speech, centred at time
t, and smooth - Frequency analysis
- estimate the distribution of power with respect
to frequency
9Waterfall display
10Speech spectrogram
11Derived formant tracks
12A/D conversion
- Sampling measures the speech signal at regular
intervals, n - Quantisation encodes the signal xn with a
discrete value
x
n
n
13Sample rate
- Nyquists theorem for a signal band-limited to B
Hz, then a rate of 2B samples per second is
needed to encode the signal faithfully - Human ear sensitive up 20 kHz (hence 44 kHz rate
for CDs) - But for speech
- high-quality needs 10 kHz bandwidth, i.e., 20 kHz
sample rate - bandwidth can be reduced to 4 kHz (8 kHz rate),
for telephone quality - e.g., 8-bit PCM at 8kHz 64 kbps
14CD-quality fS 44 kHz
15High-quality speech fS 20 kHz
16Telephone speech fS 8 kHz
17Window functions
18Frequency analysis
- Discrete Fourier Transform (DFT) is applied to
the windowed digital waveform x(n)n1,,N. - With an N-sample window, an N-point complex
spectrum is obtained X(k) k1,,N. - The modulus squared gives the power spectrum,
X2(k) - The logarithm gives the log-power spectrum,
logX2(k)
19Discrete Fourier transform
- over a finite period of time
- sampled at regular intervals
Forward transform
Inverse transform
20Frequency analysis
- Alternative methods include
- filter-bank analysis (based on a set of band-pass
filters) - approximations of the spectral envelope, e.g.,
Linear predictive coding (LPC)
21Time-frequency resolution 1
- If the window is long then
- the time resolution is poor
- the number of points, N, is large
- there are N points in the spectrum
- so there is fine frequency resolution
- narrow-band frequency analysis, or narrow-band
spectrum
22Narrow-band spectrum
23Time-frequency resolution 2
- If the window is short then
- the time resolution is good
- the number of points, N, is small
- there are N points in the spectrum
- so the frequency resolution is coarse
- broad-band frequency analysis, or broad-band
spectrum
24Wide-band spectrum
25Time-frequency resolution 3
- In summary
- long window, narrow-band spectrum
- short window, broad-band spectrum.
- Indeed, the bandwidth-time product cannot exceed
a half - where and is the sample
rate
26Wide-band and narrow-band spectrograms
27Mel-frequency filter bank
- Allocation of DFT bins to filters, spaced
according to the Mel scale
28The real cepstrum
- Procedure for computing cepstral coefficients
from the magnitude spectrum
29Mel-frequency cepstrum
- Procedure for computing cepstral coefficients,
based on the output from Mel-frequency binning
30Summary of Fourier analysis
- Fourier leads to frequency representation
- good for visualisation
- is reversible
- continuous and discrete time forms
- Wide- and narrow-band spectra obtained by
adjusting frame size - Windowing
- reduces spectral smearing
- allows for adaptation